“I know I can make my engineers 10x more productive with AI.” A non-technical CEO, confident they’ve solved what millions of experts haven’t.

“AI doesn’t matter, I tried it a year ago and it made silly mistakes.” A principal engineer, right before burying their head in the sand.

Two camps. Same conversation, different company, every week. Both sides convinced they’re right. Both ignoring what’s actually happening.

I’ve spent the last few hundred hours coding with AI. After my Executive-Coder Experiment, I kept going: more complex systems, harder problems. As CTPO leading a 300-person engineering department, I’ve also been driving AI adoption at scale.

Here is what I’ve learned about where AI helps, where it fails, and why both camps are missing the point.

The Bottom Line

AI makes engineers dramatically more productive at generating code—but code is a liability, not an asset. The productivity gains in raw code output are real. So is the faster path to unmaintainable systems.

Business shouldn’t expect 10x productivity gains in value output. Instead, expect faster prototyping, compressed timelines, and the need for stronger engineering discipline.

The engineers who will thrive are those who understand when to use AI to sprint and when to slow down and think. AI amplifies everything—your good practices and your bad ones. Requirements, software quality, and modularization matter more than ever.

That’s the conclusion, let’s explore why.

The Assembly Line Problem

The core misconception: confusing code output with business value.

It’s easy to think about software development as coding. Writing code is what engineers love and what business thinks we do all day. However, there’s more to developing software products.

There’s the full Software Development Life Cycle (SDLC): requirements, planning, design, implementation, verification, deployment, operations. You can deliberately skip some of these when building a prototype or MVP. But once real customers rely on your product, all these stages become essential.

GenAI used by developers—DevAI—helps us produce more code. So it is rather unfortunate that we have all those other activities where AI doesn’t help as much.

Picture software delivery as an assembly line. AI just supercharged one station—coding—so it pumps out 10x more components. Sounds great, right?

Except the assembly line doesn’t work that way. Components pile up at the next stations: verification, deployment, operations. Meanwhile, the coding station sits idle waiting for requirements and design decisions from upstream. As the line moves at the speed of its slowest station, value generation is bottlenecked by other SDLC stages.

Some voices yell about 10x or 100x more productivity. And there might be some truth to it, an engineer indeed can now produce 10x more code. But without significant improvement in value for the customers, this is a vanity metric.

I’ve trashed a 20000-line project after vibe-coding with AI—and started from scratch. Haven’t seen many people share such experiences. Turns out “LOC deleted” isn’t the flex metric people want to hear about.

Others retort cynically that there is no improvement, and that AI is destroying the industry. And there is a kernel truth here. If we just produce more code while ignoring other activities, the codebase rots and quickly delivers negative economic value.

Software creates value only when customers use it.

The Complexity Tax

One of our strongest engineers worked on a new project for 9 months. He’d been using DevAI for years with advanced tooling and AI-adjusted workflow. Early on in the project, he was unbelievably productive.

But as time progressed, that slowed. Six months in, he didn’t know what was implemented where. Adding new features felt like crawling rather than sprinting. When an opportunity came to join a new project, he jumped on it without looking back.

DevAI compresses timelines.

Every codebase follows the same arc: from greenfield to legacy. Meet the holy grail of the greenfield project and the depression-inducing legacy codebase.

We have:

  1. Greenfield: A clean new codebase focused on rapid development and experimentation with minimal constraints or technical debt.
  2. Maturing: A growing system where features are still actively added but development slows as stability and maintainability become priorities.
  3. Legacy: An aging codebase with looming issues and complexity, where changes are risky and often avoided without significant refactoring.

Here’s why the two camps disagree: The 100x productivity people are often working on greenfield projects. They’re in the honeymoon stage. The codebase is simple, clean, and AI does a great job adding another 5000 lines of code while they sip their favorite warm beverage.

That same engineer told me with a straight face that he was 60x-120x more productive with AI. Until he wasn’t.

Codebases inevitably become legacy. Adding new features becomes exponentially harder. Skeptics have lived in legacy systems, where stress levels are off the charts as even minor changes might trigger production outages.

Who’s right? Both. And neither.

Adding new features adds complexity. Changes are rarely done with time to spare, so cut corners accumulate. Productivity drops. With DevAI, we’re blasting from greenfield to legacy at breakneck pace.

This accelerated decay isn’t just about volume—it’s also about the quality of what AI generates.

The Quality Problem

I asked AI to handle HTTP redirects. When it couldn’t fix the bug, it spawned a separate process to shell out to curl instead. This turned a simple HTTP client bug into a security hazard with external dependencies and fragile process management. Just lovely.

AI code generation comes with major quality challenges. I think of it as having a scatterbrained senior engineer with:

  • Lack of Stability - AI generates varying solutions on different runs. It’s random whether you get a better or worse solution.
  • Lack of Intent - AI does whatever you instruct it to do, without analyzing tradeoffs or asking for more context.
  • Sycophant Behavior - AI prioritizes your mood towards it over correctness. It reaffirms your bad choices and flatters you.
  • Inherently Average - LLMs are trained on both high- and low-quality code. The output is average.

When we use GenAI directly in customer-facing features, we write tests and evals to validate quality. Unfortunately, there’s no equivalent when using DevAI. Code is generated once, and it’s a roll of the dice whether you got simple or complex code, security-conscious or vulnerable, performant or inefficient.

Some are impressed with AI-generated code quality. Others are dismayed.

Developers with limited experience are often the most impressed. They lack the pattern recognition to spot subtle problems: tight coupling, poor separation of concerns, security gaps. AI output looks clean and works, so it must be good.

Experienced engineers have a more nuanced view. They know not all code is created equal. Some parts of a codebase should be optimized obsessively; others just need to work. Great engineers are happy to use average code when it fits the purpose. And they can recognize when it doesn’t.

Like the curl workaround, these problems share a pattern:

  • APNs architecture disaster. AI insisted Expo mobile apps couldn’t access APNs credentials directly and insisted I route notifications through a third-party service. Weeks later, I discovered AI was wrong, direct access worked fine. We’d built an entire unnecessary infrastructure layer. More expensive, less secure, less performant. Because AI hallucinated a constraint.
  • Test deletion. AI couldn’t fix failing unit tests after several attempts. Solution? Delete all the failing tests. And many unrelated passing tests as a bonus. Issue resolved!

AI optimizes for making issues disappear, not for solving problems correctly. Working with DevAI means constantly catching these bad decisions: wrong abstractions, unnecessary complexity, security vulnerabilities hidden behind working demos. Miss them and they compound. Fast.

The Knowledge Deficit

Last month, I discovered a beautiful command-pattern abstraction in my codebase. Perfect separation of concerns. Elegant middleware. The kind of code you’d show off in a conference talk. I stared at it for ten minutes, trying to remember why it existed.

Then it hit me: it didn’t need to. The entire abstraction—hundreds of lines of gorgeous, over-engineered code—solved a problem that didn’t exist. I had spent hours implementing it with AI just weeks earlier. Now I couldn’t even remember what we were trying to address.

Ugh. Delete. Commit. Push. A week of “productivity” gone.

The old engineering proverb goes: “Always code as if the person maintaining your code is a vengeful maniac who knows where you live.” We laugh because it’s true: that maniac is usually future-you. Three months later, staring at your own clever code with no memory of writing it.

With DevAI, that maniac arrives much faster. And angrier. That code you barely understood when AI generated it? Good luck debugging it next week. That clever abstraction AI added? You’ll curse past-you, and the AI, when it breaks production at 2 AM.

When AI writes the code, engineers don’t build mental models of the codebase. We don’t make micro decisions like naming variables or choosing between library functions. We don’t spent time optimizing code in our heads. Writing 100 lines versus 2000 lines is the difference between carefully reading a book and skimming it.

Not a big deal in a new greenfield project—there’s just not much there yet. But once the codebase grows, the difference becomes drastic. We remember less about what abstractions exist and where. Debugging becomes harder. Reuse drops and duplication skyrockets.

The Sweet Spots

Not everything is doom and gloom. With the right problems, DevAI’s speed translates to business value.

Despite limited frontend experience, I built a polished calendar flow managing multiple calendars in under a day. 1500 lines of React code that I barely understood when writing. Months later? Still working. Still expanding it. No catastrophic rewrites required. Bonus points: I don’t dread UI work anymore.

Simple frontend development is where DevAI shines. The UI provides immediate visual feedback, making it easier to spot problems and guide AI. Many frameworks enforce structure through conventions, giving AI natural guardrails.

Integrations are another sweet spot. Connecting to external services requires mapping data models to API responses. AI digests third party docs faster than we ever could, and excels at simple data transformations.

My calendar system connects with multiple providers via CalDAV and iCal—ugly protocol and data format I wasn’t familiar with. With DevAI, I completed the integration in just a few hours. Manually, it would’ve taken considerably longer, plus a pack of painkillers.

Given clear structure and immediate feedback loops, DevAI delivers. Building a calendar UI? Great. Designing your custom billing system? Good luck.

Matching Tools to Tasks

DevAI provides a powerful new toolbox. But you wouldn’t use a hammer to cut down a tree. You shouldn’t let AI generate code unsupervised for the most critical parts of your application either.

You have plenty of options along a spectrum:

  • Hand-written - Critical auth logic, core algorithms
  • Hand-written, AI improved - Complex business logic
  • AI-written, manually edited - Feature scaffolding, API endpoints
  • AI-written, human-reviewed - Standard CRUD operations
  • AI-generated, AI-tested - Utility functions, simple data transformations
  • AI-generated only - Throwaway prototypes, spike investigations

Prototyping is where DevAI truly shines. Using AI to quickly gather insights and build new prototypes is excellent. Just don’t fall in love with the result and confuse it for a production-ready system.

Speeding toward legacy can be exactly what you need.

For startups chasing product-market fit, fast legacy might be perfect. You need to validate assumptions, win first customers, close your next funding round. A legacy system that proves PMF is a massive win—regardless if you throw away the code later.

Be deliberate about the choice. Use DevAI to sprint when speed matters more than sustainability. Just don’t pretend you’re building for the long term when you’re not.

What Actually Works

Start with clear requirements. The better you can instruct AI on what to build, the more useful the generated code will be. Since DevAI produces more code faster, I invest proportionally more time in requirements. I learned this the hard way—working on a mobile app, I let AI create a 10000-line monster to manage notification delivery. By the end, I didn’t know what features it was supposed to have. Twenty hours wasted. With a sigh: git revert.

Design before you code. I iterate on a dedicated design document until I know precisely what I want to build. Then I work with AI on an implementation plan covering the full SDLC. Only then do I implement. Building a task management subsystem, I clarified data flow, chose storage, even wrote an Architecture Decision Record (ADR)—but I wasn’t precise about data modeling. Hours of rework followed.

Focus obsessively on modularization. Domain boundaries should be crystal clear. Without strong boundaries, AI creates a tangled mess faster than you can untangle it. I built a feature across backend and mobile with steps that were too large. I left validation for the end—surprise, the pieces didn’t fit. Had to extract common parts and rework both.

Hire experienced engineers who can guide AI. They know when to override bad suggestions, when to demand better abstractions, when to write code by hand. They help less experienced engineers make better design decisions. Mercifully, this one spared me the war stories.

Beyond Hype and Denial

Sustainable business value won’t come from engineers who dismiss AI or embrace it blindly. It will come from those who understand tradeoffs and use AI strategically.

AI is a force multiplier.

What are you multiplying?


If you enjoyed this post, you should share your details for the upcoming newsletter: