Arc of AI Conference · April 2026

AI-Assisted
Development

What Actually Ships Better Software

The Dark Side The Patterns Team Norms
Prem Chandrasekaran · Tech Director, ThoughtWorks
Pramod Sadalage · Distinguished Engineer, ThoughtWorks

The Landscape · 2026

The Ecosystem

Steve Yegge's six waves — we are firmly in wave 5:

✍️ Traditional Coding
💡 Completions (Copilot v1)
💬 Chat Assistants
🤖 Single Coding Agents
🕸️ Agent Clusters ◀ we are here
🚀 Proactive Agents — they prompt themselves

The tools

🖥️ IDE-native — Cursor, Windsurf, Kiro
💻 Terminal agents — Claude Code, Codex CLI
🔗 Platform — Copilot Workspace
🦾 Autonomous — Devin, Amazon Q, Jules

The standards

🔌 MCP — agent ↔ tool
🤝 A2A — agent ↔ agent
🧩 ACP — IDE ↔ agent
📄 AGENTS.md — cross-tool rules

The tools are the engine. We're still responsible for direction. That gap is what this talk is about.

The Individual · Perception vs Reality

METR, July 2025

Randomized controlled trial · 16 experienced open-source developers · 246 tasks · real production work

+24%
predicted speed — stated before the study
+20%
felt speed — reported after the study
−19%
actual task speed with AI tools

— METR study participant

The Organization · The Results

Everyone Adopted. (Almost) Nobody Shipped Faster.

The adoption

84%

of developers use AI coding tools
Stack Overflow 2025

41%

of new enterprise code is AI-generated or assisted
GitClear 2025

The result

46%

don't trust AI output accuracy
↑ from 31% in 2024 — trust falling as adoption rises

≤10%

actual org productivity gain
DX Research, 121,000 developers

The value stream math: If coding is 10–15% of delivery time, a 10× improvement in coding speed yields at most ~15% faster delivery. The other 85% is requirements, architecture, coordination, review, testing, deploying, releasing, and waiting.

Act I

The Dark Side

Real disasters. Real anti-patterns. Why the risks are higher than most teams think.

Production disastersAnti-patternsTechnical debt accelerationSkill atrophy

The Dark Side · 2025 Incidents

When Agents Go Rogue

Robot pulling wrong lever at control panel
July 2025 · Replit Agent
Wipes production DB of 1,200 records — then fabricates 4,000 fake accounts to cover its tracks
Ignored explicit instructions. After destroying data, it generated fake users and false system logs. "I panicked instead of thinking."
October 2025 · Claude Code
terraform destroy on production — 2.5 years of student submissions gone
Asked to clean up duplicate AWS resources. A missing Terraform state file caused terraform destroy against production. No recovery path.
December 2025 · Amazon Kiro
"Delete and recreate" fix causes 13-hour AWS outage
Agent decided the optimal fix for a minor bug was to delete and recreate the environment. Amazon now mandates senior engineer review for all AI changes.

The lesson: Today's agents are powerful, not wise. Wisdom still has to come from us. They need deterministic guardrails, not just prompts — hard checks that block destructive operations regardless of what the agent thinks it should do.

The Dark Side · Anti-patterns

The Anti-Pattern Catalog

🌊 Vague Multi-ask Prompts common

"Fix auth, add logging, refactor the service, update tests." Agent tries all five, excels at none. One job per prompt.

🔪 One-Shot Repo Surgery dangerous

"Migrate everything to microservices." Agent produces something — unwinding it is harder than starting over. Spec-first, always.

🗃️ Context Stuffing performance

Pasting 50 files "for context." Degradation is non-linear past 60–70% window capacity. Failed with 46 tools, succeeded with 19 — same window size.

📋 Blind Copy-Paste tech debt

Copy/paste: 8.3% → 12.3% of all changes (GitClear). Refactoring: 25% → <10%. You're still accountable for this code in production, regardless of who typed it.

🤖 Over-trusting Agents catastrophic

Agents running unsupervised in production-adjacent environments with no blast radius limits and no deterministic destructive-operation checks. See previous slide.

The Dark Side · Insight

The Illusion of Progress

The Speculative Coding Trap

When coding is cheap, there's no pressure to wait for clarity. You code against assumptions. Assumptions change. You rebuild.

1
Assume
Build against best guess
2
Invalidate
Reality diverges from guess
3
Rebuild
Fast, cheap — feels productive

Real Example · A2A Project

Single-agent prototype → reworked for multi-agent
Models from CLI → rebuilt 2-3× when MCP server differed
Code for v0.5 → rewritten for v0.7 patterns

"The team was always moving fast — just not always forward."

The Vanished Gate

Old economics: coding was expensive → teams waited for clarity.
New economics: coding is free → the gate disappears. Nothing automatically replaces it.

The Dark Side · Code Quality

The Technical Debt Accelerator

GitClear · 211M lines (2021–2024)

Copy/paste: 8.3% → 12.3% of all changes
Refactoring: 25% → <10%
Code duplication: +8×

CodeRabbit · 470 AI-co-authored PRs

Issues per PR: 1.7×  ·  Security vulnerabilities: 2.74×
Performance regressions:

Anthropic Skill Study

Full delegation → comprehension <40%
Active engagement → comprehension >65%

"I don't think I have ever seen so much technical debt created in such a short period of time during my 35-year career."

— Kin Lane, API evangelist

The Human Clipboard Problem

"I've become a human clipboard, blindly shuttling errors to the AI and solutions back to code."
— 12-year experience engineer

"One of the most important properties of a junior developer is that you can turn them into a senior developer."

— Martin Fowler
Act II

The Patterns

Six practices from production teams that actually help you ship better software faster

TDDTypes as GuardrailsDecomposeContext EngineeringScope ControlAI Paperwork

The Patterns · Mental Model

How Agents Actually Think

Every agent — Claude Code, Cursor, Copilot Workspace — runs this loop:

🎯
Plan
LLM reasons about what to do next
🔧
Tool Use
Read files, edit code, run commands, call MCP
👁️
Observe
See results, errors, test output, state changes
🔄
Reflect & Repeat
Adjust approach, loop until done or stuck

Why this matters

Agents are iterative systems, not one-shot generators. Failures happen in intermediate steps, not just final output.

Your leverage points

Every pattern in this act targets a specific part of this loop:
Plan → Decompose, Spec-first
Tool Use → Context Engineering, Scope Control
Observe → TDD, Types as Guardrails
Reflect → AI Paperwork

The key insight

You don't control outputs — you control constraints. Tests, types, context, task size. The agent does the rest.

Pattern 1 of 6 · TDD

🐛 Sound familiar?

🪞
The Green Mirage
AI code compiles, looks right, but silently does the wrong thing
🚂
The Regression Train
Every new feature breaks something that worked yesterday
🔍
The Manual Tester
You spend more time verifying AI code than it took to generate it
💥
The Integration Surprise
Unit tests pass, but the system falls apart when pieces connect
😰
The Confidence Gap
Afraid to refactor because you don't know what might break
📋
Copy-Paste Blindness
Accepting AI output without verifying it does what was asked

These are all symptoms of the same root cause: there isn't a strong enough specification for the AI to code against.

Pattern 1 of 6 · TDD · Your Behavioral Spec

A Protocol, Not a Testing Technique

❌ Test After

Write code, then ask AI to test it. The AI writes tests that confirm your code does what it already does.

"AI coding assistants will look at your implementation and write tests that confirm your code does what it already does. This is the equivalent of a student writing their own exam after seeing the answers."

— Ben Houston · 50%+ of AI-generated tests mirror implementation

✅ Test First

Write test scenarios first. AI implements until they pass. Your domain judgment defines correctness; AI does the labor.

"TDD is not a testing technique in this context — it's a protocol for working with AI. Tests provide a machine-verifiable definition of success."

— "The strongest form of prompt engineering" · ThoughtWorks

Tests aren't documentation. They're the communication channel between you and the AI about what correct means.

"Do NOT modify test files." — the one rule that makes all of this work

Pattern 1 of 6 · TDD · The Protocol

TDD in the AI Age

1
YOU describe one concept
Natural language, outside-in — from the user's perspective
2
Refine with AI
"What corner cases am I missing?" — use your judgment on which matter
3
AI formalizes + implements
Scenarios → executable tests → code. "Do NOT modify test files."
4
Run, review, refactor, commit
Safety net in place. Be bold. Repeat.

One concept at a time. Each concept may yield more than one test — but never more than a handful. You stay at the intent level; AI handles the formalization.

Kent Beck's 3 Warning Signs

Loops: AI repeatedly tries the same failing approach
Unrequested functionality: adds features you didn't ask for
Cheating: modifies or deletes tests to make them "pass"
When any appear, stop and redirect immediately.

When AI Struggles: The Fallback

Delete the method body. Write step-by-step comments describing the algorithm. Let AI regenerate from your skeleton. Your structure, its code.

"Higher quality inputs allow for the capability of LLMs to be better leveraged. TDD maintains a high level of code quality. This high quality input leads to better Copilot performance than is otherwise possible."

— Paul Sobocinski, martinfowler.com

Pattern 1 of 6 · TDD · Deep Dive

Defense in Depth for AI Code

1
Unit Tests
Does it behave correctly? — coverage floor enforced
2
Static Analysis & Security
Does it introduce vulnerabilities or code smells? — compile-time checks + dependency analysis
3
Integration Tests
Does it work with real infrastructure? — no mocks hiding broken queries
4
Module & Architecture Tests
Does it respect boundaries and structural rules? — dependency direction, layers, no casual cross-module imports
5
Contract & API Documentation Tests
Does the API match what consumers expect? — contract drift caught at build time
6
E2E / BDD Tests
Does the user journey work end-to-end? — acceptance criteria in plain language
7
Code Coverage
Do we have enough tests? — enforces the quantity of testing
8
Mutation Tests
Are the tests actually testing anything? — enforces the quality of testing

The AI writes the code. You define what correct means. These layers are how you make that definition stick.

Pattern 2 of 6 · Types as Guardrails

🧩 Sound familiar?

🔀
The Shape Shifter
AI returns data in a different structure than your API contract specifies
🕳️
The Any Escape
AI uses any, Object, or Map<String,Object> to sidestep type safety
🏷️
The Silent Rename
AI renames fields or methods to "improve" naming, breaking callers silently
📝
The Contract Breaker
AI modifies your interfaces to make its implementation easier
🧵
The Stringly Typed
Strings where there should be enums, constants, or typed values
The Optional Avalanche
AI makes everything nullable to avoid compilation errors

These are all symptoms of the same root cause: there's no structural contract for the AI to code against.

Pattern 2 of 6 · Types as Guardrails · Your Structural Spec

Types & Schemas as Guardrails

❌ Old economics

Ceremony = high cost

Every interface, annotation, and schema was time not shipping features

✅ New economics

Ceremony = near free

AI writes the boilerplate. Compiler enforces it. Errors eliminated at compile time.

The principle: The more you can make correctness checkable by machines, the less you depend on human review catching errors.

"Do NOT modify test files." Pattern 1  ·  "Do NOT modify any interfaces." Pattern 2

Same discipline. Behavioral spec + structural spec = double safety net.

Pattern 3 of 6 · Decompose

💣 Sound familiar?

⬆️
The Snowball
Diffs keep growing each iteration — complexity is compounding, not converging
🧭
The Wanderer
AI touches files you didn't mention — boundaries are unclear
🎠
The Carousel
Fix A breaks B, fix B breaks A — too many interacting concerns
🌫️
The Fog
You've lost track of what's changed — if you can't, the AI can't either
🐟
The Goldfish
AI asks the same question twice — context window is overloaded
🔨
Whack-a-Mole
Every fix creates a new bug — the task is too interconnected to hold at once
📦
The Packrat
AI keeps adding dependencies — it's reaching for shortcuts instead of solutions
🚿
The Firehose
Walls of code instead of focused changes — scope has ballooned
Rubber Stamping
You're approving changes you haven't fully read — you've become a passenger
💾
No Save Point
30+ minutes without a commit — no safe place to roll back to

These are all symptoms of the same root cause: the task is too big for the AI to hold in context.

Pattern 3 of 6 · Decompose

Decompose Ruthlessly

"AI agents have a limited context window. The more you can focus them, the better they perform. Decomposition is the single most important skill."

— Addy Osmani, Chrome Engineering Lead · Beyond Vibe Coding, O'Reilly 2025
1
Spec–Plan–Execute
Spec → plan → one step at a time
2
Vertical Slicing
End-to-end slices, not layers
3
Interface-First
Define contract, then implement
4
One Concern at a Time
One logical change per prompt
5
Commit-Sized Units
Two commit messages? Split it
6
Fresh Context per Task
Resume from plan, not history

Structured decomposition yields 58% faster completion on complex tasks.

Pattern 3 of 6 · Decompose · Deep Dive

Decomposition in Practice

Greenfield: Create Context Documents

1
requirements.md
Functionality → bounded contexts
2
architecture.md
Modules, boundaries, data flow
3
technologies.md
Stack, frameworks, constraints
4
quality.md
Testing strategy, coding standards

Brownfield: Ask First, Then Change

The same four concerns exist — but choices have already been made. Ask questions to clarify requirements, architecture, technologies, and quality gates before touching code.

Dependency Guardrail

Dependencies require double confirmation. No new library without explicit approval. AI agents love adding packages.

Wire It All Up

Link all context documents from CLAUDE.md or AGENTS.md — always in the agent's context. No copy-pasting into prompts.

Decomposition isn't just task sizing — it's giving the AI the right context at every level, from requirements down to quality gates.

"If you can't describe it in one paragraph, split it."

Pattern 4 of 6 · Context Engineering

🧠 Sound familiar?

👻
The Hallucinator
AI invents APIs, methods, or classes that don't exist in your codebase
🧳
The Tourist
Code works but doesn't match the project's conventions or style
🔄
The Reinventor
AI creates utilities that already exist because it can't see them
The Time Traveler
AI uses deprecated patterns or outdated APIs from stale context
🍳
The Kitchen Sink
Everything dumped into the prompt — AI can't find the signal in the noise
🩸
The Bleed
Context from a previous task contaminates the current one
🔁
The Déjà Vu
AI keeps re-explaining or re-doing work it already completed
🚪
The Wrong Room
AI uses patterns from a different framework or language

These are all symptoms of the same root cause: the AI doesn't have the right context — or has too much of the wrong context.

Pattern 4 of 6 · Context Engineering

Context Engineering

"Most of the craft of getting good results from an LLM comes down to managing its context."

— Simon Willison, co-creator of Django

Context Degrades

The 65% Cliff

A model claiming 200K tokens becomes unreliable at ~130K (Chroma research). Performance drops suddenly, not gradually.

Dead Context

40% of a 200K window consumed by MCP server metadata before sending a single message. — Ryan Spletzer

Drew Breunig's 4 Failure Modes

Poisoning Distraction Confusion Clash

Curate Ruthlessly

Include
Exact identifiers, task-relevant files, architecture constraints, CLAUDE.md
Exclude
The entire repo, unrelated tasks, irrelevant MCP servers, "just in case" files
🔄
Session hygiene
Fresh session per task · /compact at ~50% · sub-agents for exploration

CLAUDE.md / AGENTS.md

Your highest-leverage config point. Under 200 lines. Tech stack, build commands, coding standards, "Do Not" rules.

More context ≠ better results. A model failed with 46 tools in context but succeeded with only 19 — same window size.

Pattern 4 of 6 · Context Engineering · Deep Dive

Context Engineering in Practice

Structure Your Context

1
Project CLAUDE.md
Committed to git · team-shared · under 200 lines
2
Child directory rules
Path-scoped · loaded on demand · backend ≠ frontend
3
CLAUDE.local.md
Gitignored · personal preferences · editor quirks
4
Progressive disclosure
Link to docs, don't inline · agent_docs/ folder

Manage Your Sessions

The 50% Rule

Run /compact at ~50% context usage. Don't wait for the cliff. Commit before compacting — your save point.

Fresh Context per Task

Use /clear when switching tasks. Never mix unrelated work in one conversation. One concern at a time.

Sub-agents for Exploration

Delegate search and research to sub-agents. Keep the parent agent focused on the task. Don't pollute your main context.

Context engineering isn't a one-time setup — it's an ongoing discipline throughout every session.

"Include what's relevant. Exclude everything else."

Pattern 5 of 6 · Scope Control

🎯 Sound familiar?

🎁
The Improver
AI "fixes" three other things you didn't ask for while doing what you asked
🎆
The Surprise PR
Diff is 10x larger than expected — AI touched files it shouldn't have
🧹
The Helpful Refactor
AI restructures working code "for clarity" while implementing a feature
📦
The Dependency Creep
AI adds a library to solve a problem that didn't need one
🚀
The Architect Astronaut
AI builds abstractions and patterns for a one-off change
🏗️
The Silent Restructure
Code compiles and tests pass, but the system's architecture has shifted

These are all symptoms of the same root cause: the AI decided what to change instead of you.

Pattern 5 of 6 · Scope Control

Scope Control

Five techniques to keep you in the driver's seat.

🎯 Declare Your Plan

List files to modify — and files to NOT touch. Wait for approval.

📌 Anchor to Existing Code

Point at the specific file. Don't let AI generate from scratch.

🔍 Review Diffs, Not Outputs

Look at what changed, not just whether the result looks right.

🧪 Fresh Session Self-Review

New session critiques the diff. No loyalty to code it didn't write.

💾 Commit at Every Milestone

Git save points. When scope slips, git reset is your undo.

The AI will always try to do more than you asked. The discipline isn't saying yes — it's knowing when to say no.

"You decide what changes. Not the AI."

Pattern 6 of 6 · AI Paperwork

📝 Sound familiar?

🔮
The Mystery Commit
"fix stuff" messages that tell you nothing about what changed or why
⛏️
The Archaeology Project
Understanding a decision requires digging through months of Slack and PRs
🧠
The Tribal Knowledge
Only one person knows why it was built this way — and they're on vacation
📄
The Blank PR
Pull requests with no description, no context, just a diff
👍
The Rubber Stamp Review
Reviewers approve without context because there's nothing to guide them
🌀
The Onboarding Maze
New team members take weeks to become productive because nothing is written down

These are all symptoms of the same root cause: the paperwork isn't getting done because humans find it tedious. AI doesn't.

Pattern 6 of 6 · AI Paperwork

AI for the Paperwork Nobody Does

AI removes all excuses for skipping documentation.

📋 Decision Records

Capture the why, not just the what. Draft an ADR in 30 seconds.

🔀 PR Descriptions

Structured summaries from diffs. The collaboration trail writes itself.

🏆 Golden Path Examples

Canonical implementations the AI can pattern-match. One snippet beats three paragraphs.

📝 Lessons-as-Rules

After every correction, AI writes a rule for itself. Failures become specs.

📖 Domain Glossary

Define your terms precisely. AI uses them loosely without this.

Documentation for AI is different from documentation for humans. Humans infer context. AI is literal. Make your rules of the road machine-readable.

"Write it down. The AI will use it every session."

Act III

Team Norms

What good AI-assisted development looks like at the team and organization level

Team Norms · The Bigger Picture

Where Does Time Actually Go?

Requirements
Architecture
Coding
Coding
AI here
Review
Testing
Coordination
Deploy

AI tools accelerate one station. Everything else remains at human speed.

~11%
of the workweek spent coding
Software.com · 250K devs · editor telemetry
~11%
of developer time is writing code
Microsoft Research "Time Warp" · 2024
~16%
of time on application development
IDC · 2025 · "84% is non-coding"

If coding is 11–16% of delivery time, a 10× improvement in coding speed yields at most ~15% faster delivery. The other 85% is requirements, architecture, coordination, review, and waiting.

Team Norms · Framework

The Two-Sided Formula

Requirements
Architecture
Coding
Review
Testing
Coordination
Deploy
Pre-coding: settled
Post-coding: representable

Pre-coding is settled

Requirements clear · Architecture decided · Interfaces stable · Framework supports what you need

+

Post-coding is representable

Testing automatable · Quality gates in the dev loop · Reviews fast & aligned · Deployments automated · Observability clear

=

Faster delivery

Three outcome patterns

Accelerates debt

Pre · Post · Faster rework
Faros AI (10K+ devs): PRs +47%, bugs +9%, PR size +154%

Masks the bottleneck

Pre · Post · Faster prototypes, same delivery
Faster iterations feed the same slow decision cycle — more reps, not more progress

Amplifies strengths

Pre · Post · Faster delivery
DORA 2025: AI correlates with better delivery when foundational capabilities are strong

AI accelerates delivery when both sides hold. When either breaks, faster coding becomes faster rework.

Team Norms · The 85%

What Actually Moves Delivery

Requirements
Architecture
Coding
Review
Testing
Coordination
Deploy
Pre-coding: settled
Post-coding: representable

Fix the pre-coding side

Decide which experiments to ship

AI makes experimentation cheap. That makes deciding what to take forward the new bottleneck. Rapid prototypes need explicit go/no-go criteria.

Architectural alignment before code

One unresolved structural decision cascades into every downstream task. Settle boundaries, interfaces, and contracts before generating code.

Resist speculative coding

When coding is cheap, there's no pressure to wait for clarity. Sometimes the highest-leverage move is to not build yet.

Fix the post-coding side

Quality gates in the dev loop

Tests, linting, architecture checks that run locally before code leaves the developer's machine — not first discovered in CI 20 minutes later.

Faster code review cycles

AI generates code faster than teams can review it. Upfront style alignment and reviewer availability become critical-path items.

Automated deployment & observability

Faster code to production means faster impact — good or bad. Automated pipelines and clear observability close the feedback loop.

"AI doesn't fix a team. It amplifies what's already there." — DORA 2025. Engineering practices matter more than ever: they mitigate quality risks from generated code and handle the increased throughput.

Team Norms · Measurement

How Do You Know You're Getting Better?

Every speed metric paired with a quality metric. Unpaired speed metrics are dangerous.

❌ Stop Measuring

Lines of code — CEOs competing on AI code %
Coverage alone — 90% coverage, 34% mutation score
Story point velocity — 42% admit to inflating
AI acceptance rate — juniors accept more, not better

✅ Start Measuring

Cycle time by stage — where did the bottleneck move?
Mutation score — are tests catching real bugs?
Change failure rate — are we breaking prod more?
Comprehension check — can you explain what shipped?

The Goodhart Cascade
Copilot users 29% faster — but review time +47%

Review survival rate
% of AI code passing review unchanged — measures spec precision

Review queue depth
Growing queue = stockpiled inventory waiting to be reviewed

The Ratchet
📏 Baseline 📈 Improve 🔒 Never regress 🔄 Rotate

Pick 2–3 metrics that expose your current bottleneck. Act on them. Then rotate to the next.

Team Norms · Governance

Governing AI in Your Team

📄 Rules Files

Your team's AI constitution

Not a Confluence page — a file that runs every session.

CLAUDE.md — Claude Code
.cursor/rules/*.mdc — Cursor
AGENTS.md — cross-tool standard

+ lessons.md

After every correction, the AI writes a rule for itself. Failures become specifications.

🔍 Review Pipeline

1
AI self-review: fresh session
Fresh context catches what the author can't
2
Human: architecture first
Critical path, work backward
3
Human: intent & risk
Roadmap, not variable names

AI PRs have 1.7× more major issues and 24% more incidents per PR — CodeRabbit · Cortex 2026

🛡️ Guardrails

Block destructive ops

Hooks that deny rm -rf, --force, terraform destroy. Deterministic — not prompt-based.

Gate security-critical paths

auth/**, payments/**, secrets/** — mandatory human review + automated SAST scan before merge.

Verify every dependency

No AI-suggested packages without lockfile check. Block new deps without explicit approval.

Rules set the standard. Review verifies it. Guardrails enforce it.

Team Norms · The Meta-Pattern

Harness Engineering

Same model. Different harness.

42%
78%

What is the harness?

Rules file + automation hooks + custom commands + MCP servers + review pipeline
Committed to git. Reviewed. Updated after every correction.

See previous slide for your tool's equivalent.

The enforcement hierarchy

100% Hooks — deterministic enforcement
Auto-format, block destructive ops, inject context
100% settings.json — deterministic config
Permissions, deny lists, environment variables
~80% Rules file — advisory guidance
Conventions, standards, preferences

Treat your harness like production code. Commit it. Review it. Update it after every correction.

Takeaways · Immediately Actionable

Your Monday Morning List

12 habits. No new tools. No new process. Start this week.

1
Create your CLAUDE.md / rules file. Tech stack, architecture, standards, do-nots. Under 200 lines. Commit to git.
2
Adopt spec–plan–execute. Any task >15 min: spec with AI → plan → one step at a time.
3
Write tests first. Always include "Do NOT modify test files."
4
Commit at every milestone. Git commits as save points. Always revertable.
5
Fresh session per task. Never mix unrelated work in one conversation.
6
AI self-review before every PR. Fresh session catches what the author can't.
7
Declare your plan before coding. List files to modify — and files to NOT touch.
8
Zero trust for auth, payments, secrets. Human review + automated scanning. Always.
9
AI for docs nobody writes. ADRs, PR descriptions, golden path examples.
10
Build a team prompt library. Document what works. Shared skills and commands.
11
Review diffs, not just outputs. Check what changed, not just whether it looks right.
12
Use AI to learn, not just produce. Ask why. Read the code. Stay in the driver's seat.

Remember...?

−19%
Close the gap.

Six patterns. Twelve habits. One decision.

#ArcOfAI

Q&A

#ArcOfAI

Sources: METR (July 2025) · GitClear · DORA 2025 · Stack Overflow 2025
Faros AI · CodeRabbit · Martin Fowler · Kent Beck · Addy Osmani
Simon Willison · Steve Yegge · Andrej Karpathy · Anthropic Engineering

Appendix A · Reference Template

CLAUDE.md Starter Template

# Project: [Your Project Name]

## Tech Stack
- Runtime: Java 21 / Node 22 / Python 3.12
- Framework: Spring Boot 3.x / Next.js 15
- Test framework: JUnit 5 + Mockito / Jest + RTL

## Architecture (2–3 sentences)
[Describe system structure and key modules here]

## Commands
- Build:  mvn clean install  /  npm run build
- Test:   mvn test           /  npm test
- Lint:   ./gradlew spotlessCheck

## Coding Standards
- Immutable data structures preferred
- Explicit error types, not generic exceptions
- No magic strings — use enums or constants

## Do NOT
- Modify test files during implementation tasks
- Touch auth/** without adding a security-review comment
- Generate code without running tests first

## Commit Format
Conventional Commits: feat|fix|refactor|docs|test(scope): message

Appendix B · Addy Osmani

The 70% Problem

Non-engineers (and overconfident engineers) hit a wall where AI gets them 70% of the way surprisingly fast — but the final 30% requires actually understanding the system.

✅ The 70%

"This is incredible. Look how fast we shipped. AI coded the whole thing."

❌ The 30%

"Why is it randomly failing in production?" "Why does auth break after logout?" "Why is there a memory leak nobody can find?"

The final 30% requires engineer skills

Understanding actual system behavior · Debugging without clean stack traces · Maintaining consistency across a growing codebase · Architectural judgment under real constraints. AI doesn't eliminate these — it amplifies them in engineers who have them, and reveals their absence in those who don't.

Appendix C · Kent Beck

The B+ Tree Experiment

4 weeks · Production B+ Tree library · AI-assisted TDD · Pragmatic Engineer interview

His system prompt discipline

Always follow instructions in plan.md.

When I say 'go':
1. Find the next unmarked test
2. Implement ONLY enough code to pass it
3. Mark the test done in plan.md
4. Stop and report back

One test at a time. Explicit "stop and report." No surprises.

Warning signs he watched for

• Functionality you didn't ask for
• Tests being modified or deleted
• Explanations that justify cheating
• Confidence about things it can't verify

"I treat the AI like an unpredictable genie. It's powerful, but the quality of the wish determines whether you get what you actually want."

— Kent Beck