I watched Claude Code generate in 30 seconds what would have taken me more than an hour. A complete set of integration tests, all structurally sound, all following the project’s conventions. And yet, on another project a few weeks later, a feature I could build in a day sat in review for a week, got reworked, sat in review again, and barely shipped ten weeks after the first line of code was written.

This isn’t an article about whether AI coding tools work. They do. I’ve used Claude Code, GitHub Copilot and Cursor across three real projects over the past year: a Java claims management system, a JavaScript/TypeScript personal assistant, and a Python multi-agent system built on LangGraph and A2A. The tools are genuinely impressive, and I’ll show you exactly how.

But the industry narrative conflates “coding faster” with “delivering faster,” and my experience across these three projects tells a different story.

Across three projects, both conditions held on exactly one. In this first part of a three-part series, I explore the AI-augmented SDLC ecosystem — what the tools can do and where they genuinely shine. In Part 2, we look at the bottlenecks that prevent faster coding from becoming faster delivery. In Part 3, we pull together the pattern and discuss what actually makes AI-augmented SDLC work.


The Three Projects

To ground this in specifics, here’s the landscape:

Claims ManagementPersonal AssistantA2A Multi-Agent System
LanguageJavaJS/TSPython / LangGraph
Greenfield?GreenfieldGreenfieldBrownfield
RequirementsClear specInformal, prototype-drivenInformal + high uncertainty
My roleSolo developerArchitect / tech leadDeveloper on a team
Coding as % of time~50%~10-20%~10-15%
AI toolsClaude Code + CopilotClaude Code + CopilotClaude Code + Copilot

That row, coding as a percentage of total time, is the number that matters most. Hold it in your mind as you read.


The Tools: What AI Coding Assistants Can Do

Before getting into where delivery breaks down, I want to be clear: these tools represent a genuine step change. This isn’t an anti-AI article. It’s a calibration.

Both Claude Code and Copilot handle complex, multi-step instructions with remarkable facility across Java, JS/TS, and Python. They write code, refactor code, explain code, and, critically, they integrate with the broader development ecosystem through MCP (Model Context Protocol) servers.

On these projects, MCP integrations included:

  • LangChain docs for live lookups of the latest LangGraph features, which was essential given how fast the API was moving and how prone the models are to hallucinating outdated patterns
  • Atlassian for creating and updating stories and tasks without leaving the coding flow
  • Chrome DevTools for driving LangGraph Studio’s UI to run multi-step conversational tests from natural language descriptions

Beyond MCP, the tools excel at pattern discovery: finding how a problem like OAuth integration was already solved elsewhere in the codebase before implementing it fresh. They enforce consistency and prevent duplication in ways that grep and institutional memory never could.

The code review workflow deserves special mention: paste a screenshot of review comments, the AI evaluates their validity, and implements fixes where warranted. This turned a context-switching tax into a fluid process.

Where Claude Code Had an Edge

Claude Code offered constructs that Copilot seemed to lack at the time of writing (Copilot’s January 2026 release may close some gaps):

The Testing Showcase

The full session on the claims project illustrates what’s possible when everything aligns:

The coverage analyst and test scaffolder ran in parallel. I reviewed the analyst’s report. The ArchUnit skill generated architecture tests. Two failed immediately, confirming real violations in the codebase. The REST Docs skill generated documentation tests for 8 endpoints as a background subagent. The migration test writer produced 13 migration tests. I identified 5 critical BDD scenarios (this required business judgment the tools couldn’t provide). The BDD skill generated feature files, page objects, and step definitions. JMH benchmarks were written for the highest-complexity methods, and a Gatling simulation was created for load testing.

Skills carried domain expertise. Agents handled scale. The developer made judgment calls. This separation of concerns is what made it work.

And it only worked because the claims project had clear requirements and stable foundations.


Where It Worked: The Claims Management System

The claims project is the control case. It proves AI tools can accelerate delivery when conditions are right.

But “clear requirements” didn’t fall from the sky. The functional specification existed, but it targeted a different technology stack entirely (.NET backend, React frontend). Before writing a line of application code, I used AI tools to process the spec itself: deepening my understanding of the domain, evaluating Java against .NET for this specific problem (Spring Modulith’s event-driven architecture turned out to be a natural fit for the claim lifecycle), designing the bounded contexts and module boundaries, and establishing quality guidelines with specific thresholds for coverage, mutation testing, and performance.

The result was a set of foundational documents: an architecture built around 12 bounded contexts with event-driven integration, quality guidelines specifying mutation score targets of 90% for the rules engine and 85% for financial calculations, and a project-level CLAUDE.md that encoded the development philosophy (functional programming with pure functions for business logic, mutations only at service boundaries) along with concrete patterns and constraints. These documents became the shared context between me and the AI tools for every subsequent coding session.

This matters because the “pre-coding is settled” condition was achieved, not given. AI tools helped settle pre-coding before they accelerated coding itself.

I was also the sole developer, meaning zero coordination overhead and no review bottleneck. Post-coding activities were representable as code: REST Docs generated documentation from tests, ArchUnit enforced architecture as executable rules, Testcontainers provided realistic integration environments, BDD scenarios served as living acceptance criteria, and performance benchmarks were automated.

Roughly 50% of my time on this project was actual coding. That’s a large enough fraction for AI tools to make a meaningful difference. And they did. Delivery was genuinely faster.

Both sides of the equation held. Pre-coding was settled through deliberate investment. Post-coding was automatable: testing, documentation, and architectural compliance were all expressed as code that the AI tools could generate and maintain.

This is what the marketing materials promise. It’s real. It’s just not the whole story.


In Part 2: The Bottlenecks of AI-Augmented SDLC, we look at what happens when pre-coding and post-coding conditions break down — and why faster coding can actually make things worse.