The first agent your team ships won't read Confluence.
Your knowledge base has 47 pages. Maybe it's clean. Maybe it's chaos — Slack threads pinned to the wall, READMEs nobody touched since 2023, a forgotten Notion page that contradicts everything. It doesn't matter. The agent doesn't care about structure. It cares about what it can parse and act on.
When a human reads "retry up to 3 times with exponential backoff," they pattern-match against years of experience. They know what "exponential" means, what "retry" implies, when it makes sense. An agent reads the same sentence and either extracts a concrete rule or misses it entirely. If the doc says "usually retry 3 times," it has no idea whether "usually" means "always" or "sometimes."
This is the moment your documentation becomes infrastructure — not literature.
The Shift Nobody's Really Talking About
Your team is solving a problem that doesn't exist yet in most organisations: how to write requirements that both humans and machines can execute.
For 30 years, documentation solved one problem: help humans find and understand information. We built Confluence, Wiki, Notion. We hired technical writers. We created documentation standards. Humans still got confused, but at least there was a chance to clarify.
In 2026, you have a second reader: an autonomous system.
When your agent reads a requirement, it doesn't have context. It doesn't know your company's defaults. It doesn't call a Slack channel asking "wait, do we actually need to handle negative balances?" It executes the spec as written. If the spec is vague, the execution is unreliable. If the spec is inconsistent, the agent fails in production.
The silent failure is the killer. Your agent might not crash. It might just make subtly wrong decisions at scale — approving requests it shouldn't, missing edge cases, breaking workflows downstream.
The Medtech Proof: When Two Agents Wrote to the Same Doc at the Same Time
I was brought in as an advisor to a team building medtech software. 14 engineers. London, Bangalore, Dhaka. Three time zones, never fully overlapping.
The constraint was non-negotiable: regulated development. Accuracy matters more than speed by an order of magnitude. A documentation error isn't technical debt — it's a compliance failure waiting to happen.
They were running multiple Claude Code agents across the team. Each agent helped engineers write, review, and document code changes.
The failure nobody anticipated: two agents writing to the same document simultaneously.
Not a race condition in the traditional sense. Both writes succeeded. Both agents were confident. Both outputs were internally consistent. The result was a document that was half-true — a blend of two incompatible states, neither flagged as wrong, both treated as current.
An engineer in Dhaka read it six hours later and built against it. The inconsistency reached QA four days after it was written. In medtech, four days of engineering work on a false assumption isn't just waste — it's a compliance audit risk.
This happened twice before I was brought in to design the architecture.
The Architecture We Built
The fix was a multi-agent knowledge base with three hard rules. The team built it. I designed the constraints.
Rule 1: Agents have explicit read/write protocols — not just access.
Every agent operates under one of two modes:
-
Write mode: Triggered only by explicit human action — a PR merge, an architecture decision, a confirmed deployment. The agent documents what changed, why, and what downstream systems need to know. Every write is stamped: timestamp, triggering event, human who authorised it.
-
Read mode: The default. The agent reads the knowledge base to inform its work. But it never treats documentation as SSOT. It treats it as last known state — accurate as of a timestamp, not guaranteed current.
That distinction is the whole game. "SSOT" means: this is the truth. "Last known state" means: this was true at [timestamp]. Verify if the stakes are high.
In regulated development, that difference isn't philosophical. It's the safety layer.
Rule 2: Active sessions lock the document.
When a Claude Code session is actively reading a document, that document is locked for writes.
Not because concurrent writes are impossible — they're trivially possible. Because an agent mid-session builds its reasoning from the document as it exists now. If the document changes underneath it, the agent's output becomes a blend of two states it never saw together. Confident. Internally consistent. Wrong.
Lock parameters we settled on after testing: - Duration: session lifetime, hard cap at 45 minutes - Auto-release: on session end, with audit log entry - Visibility: all engineers see which documents are locked, by whom, for how long
That last point was critical for the timezone spread. An engineer starting their day in Dhaka could see that a London session had locked a document 20 minutes ago. They knew to wait or work on something else. No conflict, no merge.
Lock contention across 6 months in production: 4% of sessions. If it had been higher, engineers would have routed around the system. At 4%, they didn't.
Rule 3: No agent writes to a document it hasn't read in the current session.
This sounds obvious. It isn't enforced by default anywhere.
An agent can be prompted to update documentation without reading current state first. In a fast-moving codebase — across three time zones — that means an agent confidently documents something that is no longer true.
The rule: before any write, read the current document version. If the version has changed since the session started: stop, flag to the human, do not proceed. The human decides whether to merge, reconcile, or discard. The agent surfaces conflicts. It doesn't resolve them.
What It Fixed
| Metric | Before | After (6 months) |
|---|---|---|
| Concurrent write conflicts caught before QA | 0 (undetected) | 23 |
| Drift incidents reaching QA | ~2/month | 2 total |
| Avg. time to catch cross-timezone inconsistency | 4 days | 4 hours |
| Session lock contention | N/A | 4% of sessions |
| Drift reaching compliance review | Flagged twice | Zero |
What This Proved
The failure wasn't the agents. The agents did exactly what they were told. The failure was the assumption that documentation could absorb concurrent writes the way a database absorbs concurrent inserts — with conflict resolution handled automatically.
Documentation isn't a database. Two conflicting truths don't merge cleanly. They produce a third document that's wrong in ways that aren't obvious.
In regulated development, non-obvious wrong is worse than obviously broken. A crash stops the work. A confident half-truth continues it — in the wrong direction.
Why Human-First Documentation Fails at Agent Scale
Let's be specific about what breaks. The medtech case was a locking failure. But there are three other failure modes that hit every team running agents.
Failure Mode 1: Narrative Ambiguity
A human reads: "The system should prioritise recent user activity when generating recommendations."
Interpretation 1: Weight by recency — most recent = highest priority. Interpretation 2: Only consider activity from the last 30 days. Interpretation 3: Most recent data first, then older data as fallback.
A human engineering team ships, gets feedback, and iterates. An agent doesn't. It picks one interpretation and amplifies it across millions of decisions.
Failure Mode 2: Context-Dependent Defaults
Your docs say: "Validate email before sending."
In your team's head, this means check syntax, verify domain DNS, send verification code, and wait for user confirmation. To an agent, "validate" is a single step. If the doc doesn't specify which kind of validation, the agent either over-validates (slow) or under-validates (missed fraud).
Failure Mode 3: Missing Edge Cases
Your API docs say: "GET /users/{id} returns user data."
The spec doesn't mention: what if the user is deleted? What if they're in a region that violates access policy? What if user.email is null? What if rate limit is exceeded?
A human engineer encounters the edge case in production and logs an issue. An agent encounters it and either crashes or guesses. If it guesses wrong enough times, it learns the wrong behaviour.
Failure Mode 4: Undocumented Changes
Your team ships a code change. The requirements in Jira haven't been updated. The API schema hasn't been versioned. The runbook still references the old fields.
A human notices. An agent doesn't. It orchestrates workflows using stale specs. The integration breaks — quietly, confidently, at scale.
The Architecture: Machine-First Documentation in Practice
The fix is simpler than it sounds: stop writing documentation for explanation. Start writing documentation for execution.
Layer 1: The Spec Layer — Contracts
Every system interface gets a machine-readable contract:
{
"endpoint": "/orders/{order_id}/cancel",
"method": "POST",
"authentication": "required",
"authorization": {
"rule": "user.id == order.user_id OR user.role == 'admin'",
"reject_with": "403 Forbidden"
},
"input_schema": {
"order_id": {
"type": "string",
"pattern": "^ORD-[0-9]{8}$",
"required": true
},
"reason": {
"type": "string",
"enum": ["customer_request", "payment_failed", "fraud_detected"],
"required": false
}
},
"output_schema": {
"status": "cancelled",
"refund_amount": "number",
"refund_date": "ISO8601 timestamp"
},
"failure_modes": [
{
"condition": "order already cancelled",
"response_code": 409,
"message": "Order cannot be cancelled twice"
},
{
"condition": "order in fulfillment stage",
"response_code": 422,
"message": "Cannot cancel orders being shipped"
}
],
"sideeffects": [
"creates refund transaction",
"triggers customer email",
"updates inventory"
],
"constraints": {
"max_processing_time_ms": 500,
"rate_limit": "100 per minute per user"
}
}
Both humans and agents read this. But now they read it as a contract, not as guidance. The agent knows exactly what to expect — including what breaks and why.
Layer 2: The Decision Layer — Logic Trees
Every business rule gets documented as a decision tree:
feature_access_rule:
applies_to: ["premium_users", "beta_users"]
decision_tree:
- if: user.role == "premium" AND user.subscription_active AND NOT user.billing_issue
then: grant_access
- if: user.role == "beta" AND feature.in_beta_for[user.region]
then: grant_access
- if: user.region in ["CN", "RU"] AND feature.restricted_regions
then: deny_access
reason: "Regulatory compliance"
- if: all_above_false
then: deny_access
reason: "Feature not available for this user"
evaluated_at: "request time"
audit_required: true
edge_cases:
- "Users with billing issues see 'upgrade required' not 'access denied'"
- "Beta users expire at feature launch — decision_tree changes on launch_date"
The structure removes interpretation. The agent knows exactly which conditions matter, the order of evaluation, what happens in each case, and where the edge cases are. So does the human reading it.
Layer 3: The Change Layer — Versioned Impact Logs
Every code change is logged with downstream impact:
# Commit: Refactor order cancellation logic
## Changes to Public API
- No breaking changes
- Added new field: `cancellation.reason` (optional, defaults to "unknown")
## Contract Update
- Version bumped 1.2 → 1.3
- New optional field in response schema
- No removal of existing fields
## Deployment Impact
- Backward compatible with v1.2 clients
- New agents reading v1.3 can use cancellation.reason
## When This Goes Wrong
If deployed without updating the contract:
- New clients pass `reason` field
- Agents reading stale spec won't see the field
- Analytics miss data silently
The change log travels with the code. Agents reading the new contract know exactly what's changed. No surprises in production.
Layer 4: The Knowledge Layer — Decision Records
When you ship an architectural decision, you log the reasoning — not just the outcome:
# Decision: Cache strategy for user preferences
**Date**: 27 March 2026
**Status**: Implemented
**Affected Systems**: Profile Service, Recommendation Engine
## The Decision
Redis with 5-minute TTL on preference reads.
## Why
- Handles 10k RPS with p95 latency < 50ms (load tested)
- Cache hit rate: ~94% after warm-up (28 days production data)
- User sees stale preferences for max 5 minutes — acceptable per product
## Constraints
- TTL is non-negotiable: if extended to 10 min, compliance report fails
- Cache invalidation on preference updates is synchronous — do not defer
- Do not cache across regions: EU user preferences must not cache in US
## When This Breaks
- TTL disabled: database gets crushed, latency spikes past 500ms
- Invalidation becomes async: stale preferences corrupt recommendations
- Cross-region caching: data residency laws violated
## What the Agent Needs to Know
- Do not add a caching layer on top of this
- Assume max 5-minute staleness on preference reads
- If you're building features requiring real-time preferences, this architecture is wrong — escalate
A human reads this and understands why the decision was made. An agent reads this and knows exactly when the decision applies, when it breaks, and when it doesn't apply at all.
What This Means for Your Team
You're not hiring more technical writers. You're asking everyone to write differently.
Engineering
Your PR template changes:
## Definition of Done
- [ ] Public API contract updated (if applicable)
- [ ] Internal API contract validated
- [ ] Architectural decision logged (if applicable)
- [ ] Edge cases added to relevant decision tree
- [ ] Failure modes in runbook updated
- [ ] Schema migration logged (if data changes)
If you're adding a retry mechanism, the contract gets updated before merge. If you're changing how permissions work, the decision tree gets updated. This isn't "good practice." It's the definition of done.
Time cost: 15 minutes per PR, upfront. Savings: hours of downstream confusion, rework, and agent failures.
Product & Requirements
Your PRD shifts from narrative to structured. Instead of:
"The system should intelligently handle user onboarding based on their role and region."
You write:
{
"feature": "role_based_onboarding",
"applies_to": ["new_users"],
"conditions": [
{
"role": "enterprise_admin",
"region": "EU",
"path": "onboarding/enterprise_eu.json",
"sso_required": true,
"compliance_notes": "GDPR, SOC2"
},
{
"role": "individual",
"region": "any",
"path": "onboarding/individual.json",
"email_verification_required": true
}
],
"success_metrics": {
"completion_rate": ">80%",
"time_to_activation": "<5 minutes"
}
}
Both humans and agents read this. The structure forces you to think through edge cases before engineering starts — not after.
QA
Your testing strategy flips.
Old approach: Read requirements. Write test cases. Compare against code.
New approach: Read the contract. Validate code matches contract. Validate contract matches system. Validate both are versioned together.
def test_contract_matches_implementation():
"""Verify API behaviour matches contract spec exactly"""
contract = load_json("api/contracts/orders.json")
# Test all success paths from contract
for success_case in contract["success_cases"]:
response = api.post(contract["endpoint"], success_case["input"])
assert response.status == success_case["expected_status"]
assert validate_schema(response.body, contract["output_schema"])
# Test all failure modes from contract
for failure in contract["failure_modes"]:
response = api.post(contract["endpoint"], failure["trigger_input"])
assert response.status == failure["expected_status"]
# Validate latency constraint from contract
start = time.time()
response = api.post(contract["endpoint"], valid_input)
latency = (time.time() - start) * 1000
assert latency < contract["constraints"]["max_processing_time_ms"]
This isn't checking if the code works. It's checking if the code matches its promise. When the contract changes, the test changes. When the code breaks the contract, the build fails.
QA stops being a functional testing role. It becomes a contract governance role. That's a more defensible position in a world where agents write increasing amounts of the code being tested.
When This Fails
Let's be honest about the failure modes of the architecture itself.
Legacy documentation debt. You have years of Confluence pages that contradict each other. Migration is the only fix — system by system, contract by contract. It's slow and it's real work. Start with the systems your agents touch first.
Stale contracts reaching production. You update the code, forget the contract, deploy. The agent reads the old contract. The fix is non-optional CI/CD validation: if code and contract don't match, the build fails.
Over-specification paralysis. You try to document everything upfront and produce 200-field schemas modelling every possible state. Start with the critical path. Document the happy path, primary failure modes, constraints. Perfect is the enemy of shipped.
Culture resistance. Engineers say they don't have time. PMs say docs will be outdated anyway. This is a leadership problem. If shipping without updating the contract isn't treated the same as shipping a bug, none of this sticks.
What to Do Monday Morning
Pick one critical system. Auth, payments, recommendations — something with real constraints and real impact.
Write the contract. Not a 50-page spec. A JSON schema: input, output, failure modes, constraints.
Update your PR template. Add: "Contract validated: [ ] Yes [ ] N/A"
Add one CI/CD check. If code touches that system, validate it matches the contract. Let it fail silently at first. Let engineers see the results.
Get QA involved. Show them the contract. Ask: "Does this match what we actually test?"
Ship the contract with the code. Not as a separate artifact. As part of the deployment.
Then repeat for the next system.
The Broader Pattern
For 30 years, the primary reader was human. Documentation was guidance. It was polished, written for clarity and style, updated when someone had time.
In 2026, the primary reader is increasingly autonomous. Documentation is specification. It's written for executability and precision. It's versioned. It's validated. It travels with code.
Humans still read it — probably in a cleaner interface, maybe generated from the spec. But the writing is for machines first.
This shifts everything: how you write requirements, how you review code, how you test systems, what QA does, how architects communicate decisions.
The teams that figure this out first aren't the ones with the best documentation tools. They're the ones who decided that documentation stopped being marketing and became infrastructure.
I've seen both. The gap isn't tooling. It's whether the team treats knowledge as a system — with integrity constraints, versioning, and failure modes — or as a suggestion.
About the Author
Netanel Eliav builds production AI systems — agentic workflows, RAG pipelines, LLM infrastructure, and evaluation frameworks. He advises engineering teams on agentic architecture and runs projects including Data, Analytics and AI related domains. London-based, global AI leader.