The Architect: Autonomous Development Lifecycle Layer for Agentic AI Coding Tools
The Story
I spent three months watching AI coding agents fail quietly, not catastrophically. Agents would declare success after writing two files but leave broken test suites behind. They rewrote the same function multiple times because the retry attempts lacked memory of previous failures. They stalled, token-burning at 11pm, waiting for a human to rescue them. I was building production-grade agentic systems—you can't babysit every run. The agents wrote code well but never finished the job.
So I built The Architect. This open-source autonomous development lifecycle layer wraps your AI coding CLI and adds what’s missing: planning, completion verification, retries with memory, quality review, and persistent project intelligence. It’s provider-agnostic—supporting Claude Code CLI, Codex CLI, and OpenCode CLI. Available on PyPI, The Architect’s build 10042 is the honest count of autonomous operations taken to stabilise it.
The Pain: You Are the Orchestration Layer
Using AI agents directly looks like this: you write a goal, run the agent, wait. Minutes later, it rewrites the same function endlessly without progress. You kill it, re-prompt with adjusted context, and run again. The agent exits code 0 but breaks tests. You repeat this cycle task after task until exhaustion. By the ninth task, it ships with known edge cases because you are too tired to babysit.
Active supervision for a 10-task goal: 3-4 hours, mostly watching, not thinking. You end up as the orchestration layer. AI solves coding; nobody solves orchestration.
The Four Gaps
- Completion isn't verified. The agent’s “task complete” claims and exit codes mean nothing without checks. It hallucinates completion, leaving partial output.
- Retries have no memory. Each retry starts cold—no knowledge of previous mistakes or files edited, causing repeated failures.
- No stuck detection. Blocked agents keep token-burning indefinitely, requiring manual kills.
- Context resets every session. Each run loses project history, constraints, and prior lessons.
The Fake Solutions
Better prompts: Two hours crafting perfect prompts helps once but fails as goals change, code evolves, or models update. Prompt tuning is a brittle babysitting crutch.
More expensive models: Models like GPT-4o or Claude Opus reduce hallucinations but don’t eliminate supervision. They can still get stuck, lack stuck detection, and require costly QA runs.
The core issue isn’t model capability—it’s losing control when handing it off. No way to walk away from a run safely, so you choose vigilance or chaos.
The Solution
The Architect is the reliable handoff mechanism. You retain control setting goals and architecture. The Architect executes with built-in failure handling so you don’t have to intervene.
Provider-Agnostic Architecture
- Claude Code CLI: Anthropic’s agentic coding tool.
- Codex CLI: OpenAI’s terminal-based coding agent.
- OpenCode CLI: Open-source, multi-provider alternative.
No vendor lock-in. Switch providers mid-run or assign different providers for planning versus execution. Your orchestration layer stays the same.
Mechanism 1: Autonomous Planning
The Architect’s agent reads your goal, project structure, ARCHITECT.md, and context files, then decomposes into numbered, self-contained task files under the tasks/ directory. Each file is a concrete instruction for execution—no ambiguity.
Scope controls task size: simple scope splits goals into 15-20 narrow tasks for small models, complex scope creates 3-5 broad tasks suited to frontier models.
Mechanism 2: Multi-Signal Completion Detection
Completion confirmation requires corroboration from at least two of four signals. The table below describes these.
| Signal | How it works | Strength |
|---|---|---|
| Promise tag | Agent outputs <promise>TXX_COMPLETE</promise> | Strong |
| PROGRESS.md | Task marked Done in progress file | Moderate |
| Clean exit | Provider CLI exited with code 0 | Weak |
| Progress signal | Text contains “all tests pass”, “task is done” | Weak |
Decision rules: Two or more signals positive declare done. Promise tag alone suffices. Exit code alone is ignored since providers can exit code 0 on timeouts. Any stuck signal anywhere overrides completion claims.
Mechanism 3: Circuit Breaker
Retries handle model failures; circuit breaker handles recurring failure patterns. Three persisted counters track failure patterns:
- No-progress: Zero files written for three consecutive attempts trips circuit.
- Same-error: Identical logical bash error fingerprint across attempts trips circuit.
- Token decline: Third attempt uses less than 40% tokens of first attempt, combined with elevated counters trips circuit.
Recovery actions include WAIT, REPLAN (rewrite the failing task), or COOLDOWN_WAIT (pausing on rate limit rather than retry consumption). State persists across restarts.
Mechanism 4: Retry with Context Carry
Failed tasks retry up to 3 times by default, 30 in persistent mode, carrying summaries of previous attempts’ context: files written, bash commands run, test failures. The new attempt knows exactly what was tried and can avoid repeating mistakes.
Retry models allow fallback providers per attempt. Attempt 1 uses default; retries escalate to stronger models or different providers to improve success.
Mechanism 5: Retrospective Reviewer
After all tasks finish, an independent reviewer agent audits the results. It reads PROGRESS.md, task files, and code. It runs your test suite. If issues are found, it generates fix-up tasks prefixed with R which enter the execution pipeline. Persistent mode runs two review rounds to verify fixes.
The reviewer cannot modify existing files or progress state, only add fixes or confirm completeness.
Mechanism 6: ARCHITECT.md — Persistent Project Intelligence
This structured file accumulates project knowledge across sessions and is read by every agent before work.
- Project Structure: Repo type, languages, frameworks, dependencies, test commands.
- Permanent Decisions: Architectural choices documented and append-only.
- Known Constraints: Discovered limitations stored for future tasks.
- Lessons Learned: Failures and best practices recorded continuously.
- Best Practices: Consistent coding or design standards.
- Planning History: Auto-appended after each plan.
After 50+ builds, The Architect’s own ARCHITECT.md contained over 23 architectural decisions and 11 lessons, all automated by agents while building the project.
Production Codebases
Production codebases are complex and accumulate architectural history that agents don’t initially understand. The Architect mitigates risks via:
- Persistent knowledge in ARCHITECT.md capturing constraints and decisions.
- Planning on frontier models with full context awareness.
- Task scope isolation to limit the impact of each operation.
- You remain the architect: defining goals, scope, and context.
Local GPU Models
Local models have token limits that fill quickly with context, code, and test outputs. The Architect’s planning decomposes goals into scoped tasks sized for reliable local execution (15k-25k tokens each).
This mixed-model approach uses frontier models for planning and retrospective review, local models for execution—enabling real production work with 30k token windows.
Overnight Safety
A safe unattended run example:
[architect] persistent = true token_budget_per_hour = 500000
The system handles failures, retries with context, rate limit cooldowns, circuit breaker trips, replanning, retrospectives, process interruptions, and resumes transparently. Token budgets limit hourly spending. Cooling periods pause retries without penalty. Concurrency control and build counting ensure stability. Version 1.0.0 (build 10042) records this rigorous work.
Dog-Food
The Architect was built using itself. At task T47, the circuit breaker caught a repeated logical bug (FileExistsError) despite differing paths and line numbers. The system tripped the circuit, chose to REPLAN that task, and fixed its own lock file implementation.
Honest Limits
- Does not write better code than your model. It raises the reliability floor, not the quality ceiling.
- Bad goals produce vague tasks. Clear, structured goals and context files remain critical.
- Retrospective review is a quality gate, not a substitute for nuanced engineering judgment.
- Claude Code CLI lacks token usage reporting. Use OpenCode or Codex if token accounting matters.
- Free open-source models are slower—expect about 3x runtime on 10-task goals compared to Claude Sonnet.
Getting Started
Install The Architect on Python 3.11+:
pip install the-architect
Requires one AI coding CLI: Claude Code, Codex, or OpenCode.
Initialise your project:
architect init
Plan and execute a goal:
architect --plan --goal "add Stripe payment integration"
architect
That's it. The Architect autonomously plans, executes, retries, reviews, and reports unattended.
Full docs at github.com/iNetanel/the-architect.