The Vibe Coding Wall: Why AI-Built Apps Break at Six Months

We’ve been auditing AI-built codebases for over a year now. The pattern is so consistent we can almost set a calendar by it.

A founder builds an MVP with Cursor, Bolt, Lovable, Replit Agent, or Claude Code. It works. They launch. Users show up. Revenue starts. Everything feels great.

Then around month six, things start to slow down. A feature that would have taken a day now takes a week. A bug fix in one module breaks something in another. The founder can’t explain why the code does what it does, only that it does. They start dreading the codebase. They start Googling “rewrite vs. refactor.”

That’s the wall. And right now, a lot of people are hitting it.

The timeline is predictable

Here’s what the typical trajectory looks like for a vibe-coded project:

Month	What happens
1-2	Euphoria. MVP ships in weeks instead of months. Everything works.
3-4	Feature velocity is still high, but strange bugs start appearing. Workarounds accumulate.
5-6	New features take 3-5x longer. Every change has unexpected side effects. The founder stops trusting the codebase.
7+	Development grinds. Rewrite conversations begin. Some founders abandon the project entirely.

This isn’t unique to any one tool. We’ve seen it with Cursor projects, Bolt apps, Lovable MVPs, and Claude Code builds. The tools are different. The wall is the same.

Why it happens

The conventional explanation is “AI writes bad code.” That’s not quite right. The code is usually fine line-by-line. The problem is structural, and it comes from three things that AI tools fundamentally cannot do.

No architectural decisions were made

When you prompt an AI to “build a user dashboard with billing,” it produces code that works. But it also makes dozens of implicit decisions: how state is managed, how errors propagate, how the data model relates to the UI, where business logic lives. These aren’t decisions in any meaningful sense. They’re predictions. The model predicts what code is most likely given your prompt.

The difference matters. A decision is made with context about your business, your scale targets, your team, your future roadmap. A prediction is made from statistical patterns in training data. The prediction might be correct today. But it has no awareness of tomorrow.

When you make these predictions across hundreds of prompts over months, you end up with a codebase where the “architecture” is really just an accumulation of locally reasonable choices that don’t compose into a coherent whole.

Each prompt session is an island

This is the mechanical reason the wall exists. AI coding tools operate in sessions. Each session has its own context window. The model sees some of your code, your current prompt, maybe some file context. But it doesn’t carry forward understanding from the last session.

Session 1 sets up auth with middleware pattern A. Session 47, three months later, adds a new API route. The model doesn’t remember pattern A. It generates something that works using pattern B. Both patterns are valid. Together, they create confusion and maintenance overhead.

Multiply this across every major concern, error handling, state management, API design, validation, and you get what we see in every audit: three or four competing approaches to the same problem, scattered across the codebase, none of them wrong individually, all of them wrong collectively.

No institutional knowledge exists

In a traditional engineering team, knowledge lives in people’s heads. Someone knows why the billing module works that way. Someone remembers the edge case that caused the outage in November. That knowledge is messy and informal, but it exists.

In a vibe-coded project, that knowledge doesn’t exist anywhere. The AI that generated the code doesn’t remember generating it. The founder remembers what they prompted for, not what was produced. When you need to modify a complex piece of logic six months later, you’re reverse-engineering a stranger’s work. Except the stranger was a language model that optimized for “looks correct” rather than “is understandable.”

The data backs this up

This isn’t just our anecdotal experience from audits. The industry data is starting to paint a clear picture.

Metric	Finding	Source
AI code share	41% of all new GitHub code is AI-generated	GitHub, 2025
Security vulnerabilities	AI-generated code has 2.74x more vulnerabilities	Backslash Security, 2025
Code duplication	Up 48% in AI-heavy codebases	GitClear, 2025
Refactoring activity	Down 60% since AI adoption	GitClear, 2025
Churn rate	Code changed within 2 weeks of being written up 39%	GitClear, 2025

That last metric, code churn, is the smoking gun. AI-generated code gets changed or deleted within two weeks of being written at significantly higher rates than human-written code. That’s the wall showing up in the data: code that works when it ships but doesn’t hold up when the codebase evolves.

The overall trend has analysts projecting $1.5 trillion in accumulated technical debt by 2027, driven primarily by AI-generated code that nobody is maintaining properly. That number sounds dramatic until you look at the compounding effect: more code generated faster, with less refactoring, more duplication, and fewer humans who understand what it does.

The specific symptoms

If you’re reading this and wondering whether you’ve hit the wall, here are the diagnostic signs we look for in every vibe code cleanup engagement:

Features that used to take a day now take a week. Not because they’re more complex. Because the codebase has become a minefield. You can’t change one thing without understanding twelve other things, and nobody fully understands those twelve things because they were generated across different context windows.

Every change breaks something unrelated. Tight coupling that nobody intended. The AI connected things that shouldn’t be connected because in the training data, that’s how they were usually connected. Now module A depends on module B’s internal implementation details, and changing B’s error format crashes A.

Multiple competing patterns for the same concern. Three different ways to handle API errors. Two state management approaches. Authentication middleware that works differently on different route groups. Each pattern was reasonable in its session. Together they’re a maintenance nightmare.

Tests that pass but don’t catch real bugs. We wrote about this in our Claude Code audit findings. AI-generated tests tend to assert that things exist rather than that things are correct. Your test suite is green. Your app is broken. The tests are checking expect(result).toBeDefined() instead of expect(result.status).toBe('active').

// Typical AI-generated test: always passes, catches nothing
test("processPayment handles subscription", async () => {
  const result = await processPayment({ plan: "pro", userId: "123" });
  expect(result).toBeTruthy();
  expect(result.success).toBeDefined();
});

// What actually catches bugs
test("processPayment creates subscription with correct billing cycle", async () => {
  const result = await processPayment({ plan: "pro", userId: "123" });
  expect(result.success).toBe(true);
  expect(result.subscription.interval).toBe("month");
  expect(result.subscription.amount).toBe(4900);
  expect(result.subscription.trialEndsAt).toBeNull();
  expect(result.nextInvoiceDate).toEqual(expect.any(Date));
});

You can’t onboard a new developer. This is the test that kills most vibe-coded projects. When you try to bring on a human engineer, they can’t ramp up. There’s no architecture to learn, just patterns to discover. No documentation that reflects reality. No single person who can explain the system end-to-end. The new hire spends weeks reading code and still can’t make confident changes.

What to do about it

If you’ve hit the wall, here’s the playbook we use. In order. Resist the temptation to skip to step 4.

1. Audit first. Don’t rewrite.

The rewrite instinct is strong and almost always wrong at this stage. A rewrite takes 3-6 months, costs tens of thousands of dollars, and you’ll make new mistakes. The existing codebase works. It’s just hard to change. Those are very different problems.

Start with a code audit. Map what’s actually broken versus what’s just messy. Most codebases we audit are 70% fine. The remaining 30% concentrates in a few critical areas. You need to know which 30% before you start cutting.

2. Fix the foundations

Two things, and two things only, need to be right before anything else matters: authentication and data model.

If your auth has holes (and it probably does), fix them first. If your data model doesn’t support multi-tenancy, or has no migration strategy, or conflates concerns that should be separate, fix that second. Everything else is built on these. Get them wrong and every subsequent fix is temporary.

3. Add the operational layer

This is the fastest win because it’s purely additive. You’re not rewriting. You’re adding what the AI never added:

Error monitoring (Sentry, Highlight, Axiom)
Structured logging that isn’t console.log
Rate limiting on public endpoints
Input validation that goes beyond TypeScript types
Security headers
Health checks
Graceful error handling that doesn’t leak stack traces

We’ve never audited an AI-built project that had all of these. Most had none of them. Adding this layer takes days, not weeks, and it immediately makes the app safer and more debuggable.

4. Consolidate patterns, then iterate

Now you can start cleaning. Pick the best of your three error handling patterns and migrate the others. Standardize your state management. Make the auth middleware consistent across all routes. This is refactoring, not rewriting. You’re keeping the working code and making it consistent.

Once the patterns are consolidated, you can go back to shipping features. The difference is that now you have a coherent architecture that new code can follow. The AI tools work better after this step, because the codebase gives them better context to work from.

The uncomfortable truth

AI coding tools are genuinely great. We use them daily. They make good engineers faster and they make building MVPs accessible to people who couldn’t have built them five years ago. That’s real and valuable.

But the market is starting to discover what experienced engineers have always known: writing code is the easy part. The hard part is making code that lasts. That composes. That a stranger can understand six months later. That fails gracefully. That scales without surprises.

AI tools optimized for the easy part. The hard part is still on you.

If you’re staring at a codebase that’s fighting back, you’re not alone and you’re not stuck. Start with our free AI Code Health Check to understand where you stand. Five questions, two minutes, no sales call. You’ll get a risk assessment and a clear picture of what needs fixing first.

Or if you already know the codebase needs work, take a look at our vibe code cleanup service. We stabilize AI-built codebases so you can get back to shipping.