We Audited 5 Claude Code Projects. Here's What We Found.

We audit codebases for founders, investors, and teams inheriting code they didn’t write. Over the past three months, five of those audits have been Claude Code projects — SaaS apps built primarily or entirely with Claude Code.

The pattern is consistent enough that it’s worth sharing publicly. Not because Claude Code is bad. It’s genuinely the best AI coding tool we’ve worked with. The code quality is high, the patterns are consistent, and the output reads like it was written by a senior engineer.

That’s exactly why the problems are dangerous. They hide under clean code.

Here’s what we found across five real audits, anonymized but otherwise unchanged.

The five projects

Project	Stack	Size	Stage
A — B2B scheduling SaaS	Next.js, Prisma, PostgreSQL	~18K LOC	Pre-seed, 200 beta users
B — AI document processor	Python FastAPI, React, S3	~12K LOC	Raising seed round
C — E-commerce analytics	Next.js, Supabase	~22K LOC	Post-launch, $8K MRR
D — Healthcare intake forms	React Native, Node.js, MongoDB	~15K LOC	Compliance review for hospital pilot
E — Internal tool for a logistics company	Elixir/Phoenix, LiveView, PostgreSQL	~9K LOC	Pre-deployment review

All five founders were technical enough to prompt well and review output. None of them were shipping blindly. These weren’t vibe-coded toys — they were real products with real users or real contracts pending.

The findings

We run every audit through our automated audit tool first — 7 analyzers covering secrets, security, dependencies, structure, tests, imports, and AI patterns. That catches the structural issues. Then a senior engineer reviews architecture, assumptions, and business logic.

Across five projects, here’s the breakdown:

Critical findings: 14 total

Finding	Occurrences	Projects affected
Hardcoded secrets in source	4	A, B, C, D
SQL injection / NoSQL injection	3	A, C, D
Missing authentication on API routes	3	A, B, C
Broken access control (IDOR)	2	B, D
Sensitive data in client bundle	2	A, C

4 out of 5 projects had secrets committed to version control. Stripe keys, SendGrid API keys, database connection strings. Not in .env.example files — in actual source code. Claude Code generates working code, and “working” sometimes means hardcoding the value you pasted into the prompt.

3 projects had injectable queries. Claude Code uses ORMs correctly most of the time, but when it drops to raw queries — for complex joins, full-text search, or performance-sensitive paths — it interpolates user input directly. The code looks intentional. It’s not obviously wrong the way a junior developer’s SQL concatenation would be. It’s a well-formatted query with a variable inserted in exactly the wrong place.

3 projects had unauthenticated API routes. Not all routes — just the ones added later in the development process. The auth middleware was set up correctly for the first batch of endpoints. Later endpoints, added through shorter prompts, didn’t inherit the middleware. Claude Code doesn’t remember that your API requires authentication unless you tell it every time.

Warning findings: 31 total

Finding	Occurrences	Projects affected
No error monitoring (Sentry, etc.)	5	All
No structured logging	5	All
Missing rate limiting	4	A, B, C, D
No input validation beyond types	4	A, B, C, E
Test files with weak/no assertions	4	A, B, C, D
Circular import dependencies	3	A, C, D
Single-tenant data model for multi-tenant product	2	A, B
No database migrations strategy	2	C, D
Over-engineered abstractions	3	A, B, E
Missing CORS/security headers	3	B, C, D

Every single project had zero error monitoring and zero structured logging. This is the most consistent finding across all Claude Code audits. The code handles errors — try/catch blocks exist, error responses are returned — but nothing gets recorded anywhere. When something fails in production, nobody knows. Users see a blank screen or a generic error. The founder finds out when someone complains on Twitter.

4 out of 5 had test files that don’t actually test anything meaningful. Claude Code generates test files. They import the right modules, call the right functions, and use the right assertion syntax. But the assertions are shallow — checking that a function returns something rather than checking that it returns the right thing. Tests that always pass are worse than no tests. They give you false confidence that your code works.

// What Claude Code generates
test('createUser returns user', async () => {
  const user = await createUser({ email: 'test@example.com' });
  expect(user).toBeDefined();  // This will literally never fail
});

// What the test should actually check
test('createUser returns user with hashed password', async () => {
  const user = await createUser({ email: 'test@example.com', password: 'secret' });
  expect(user.email).toBe('test@example.com');
  expect(user.password).not.toBe('secret');
  expect(user.id).toMatch(/^usr_/);
  expect(user.createdAt).toBeInstanceOf(Date);
});

The pattern underneath

If you look at those findings as a list, they seem like individual bugs. They’re not. They’re symptoms of one underlying pattern:

Claude Code builds features. It doesn’t build systems.

A feature is “users can sign up.” A system is “users can sign up, and we know when sign-ups fail, and we rate-limit the endpoint, and we validate inputs beyond type checking, and we log the event for analytics, and we handle the edge case where the email already exists with a deleted account.”

Claude Code nails the first part. The feature works. The code is clean. But all the operational, security, and resilience concerns that make a feature production-ready — those are missing unless you explicitly prompt for each one. And even then, the implementation is often minimal.

This isn’t a bug in Claude Code. It’s a fundamental limitation of building from prompts. Prompts describe what you want. Production systems require thinking about what you don’t want — failure modes, attack vectors, data corruption scenarios, scale bottlenecks. Nobody prompts for those things because they’re not thinking about them at build time.

Severity by project stage

The interesting thing is that severity correlates with how far along the project was, not how complex it was.

Project E (the Elixir/Phoenix internal tool) had the fewest issues. It was also the project where the founder had a detailed spec before starting and treated Claude Code as an implementation tool rather than a design partner. They made the architectural decisions. Claude Code wrote the code. The assumption gap was smallest because the assumptions were made by a human.

Project C (the analytics dashboard at $8K MRR) had the most issues. It was also the project that had been built iteratively over months — feature by feature, prompt by prompt. Each new feature compounded assumptions from the previous ones. By the time we audited it, the codebase had three different patterns for API error handling, two competing state management approaches, and an auth system that had been modified four times as requirements evolved.

The lesson: Claude Code is excellent at implementing a known design. It’s dangerous when it’s also making the design decisions.

1. Run the automated scan first

We open-sourced our audit tool as a Claude Code plugin. It catches the structural issues — secrets, security patterns, dependency vulnerabilities, test quality, circular imports. Takes five minutes. Do this before the human review.

npx skills add variant-systems/skills --skill code-audit

2. Audit assumptions, not just code

The code is probably fine. The assumptions underneath it might not be. For every major architectural decision in your codebase — auth model, data architecture, API design, error handling — ask: did I specify this, or did Claude assume it?

Map the assumptions. Check them against your actual business requirements. This is where the real bugs live.

We wrote a detailed guide on what Claude Code gets wrong and how to fix it — it covers the five most common assumption-level problems.

3. Add the operational layer

This is the fastest fix because it’s additive. You’re not rewriting code — you’re adding what’s missing:

Error monitoring (Sentry, Highlight, or similar)
Structured logging (not console.log)
Rate limiting on public endpoints
Input validation beyond TypeScript types
Security headers (CORS, CSP, HSTS)
Health check endpoints
Graceful error handling that doesn’t expose stack traces

Claude Code won’t add these unless you ask. And even when you ask, check the implementation against production best practices.

4. Fix auth and data model issues early

If your auth model or data architecture is wrong, fix it now. These are foundations. Everything else is built on them. The cost of changing them goes up exponentially as you add features.

Two of our five audits required significant rework to the data model. Both founders said the same thing: “I wish I’d known this three months ago.” An audit at week two would have caught it.

The cost of not auditing

Across five projects:

Project B needed the data model reworked before their seed round. Two weeks of engineering work that delayed the raise by a month.
Project D failed the hospital’s initial security review because of the NoSQL injection and missing access controls. Three weeks of fixes, plus a re-review cycle.
Project C had a production incident caused by the unauthenticated API routes — a user discovered they could access other accounts’ data by changing an ID in the URL. The founder found out from a customer email.

The cost of an audit is measured in days. The cost of not auditing is measured in lost deals, security incidents, and delayed launches.

Get your codebase checked

If you’ve built with Claude Code — or any AI tool — and you’re approaching a launch, a fundraise, or a compliance review, get the codebase audited while fixing things is still cheap.

Start with our free AI Code Health Check. Five questions, two minutes. You’ll get a risk assessment and a cost estimate for a full audit — no sales call required.

For the automated pass, install our open-source audit plugin and run it yourself. It’s MIT-licensed, zero dependencies, and it catches the 70% of issues that are structurally detectable.

For the other 30% — the architecture review, the assumption audit, the business logic validation — that’s what we do.

Variant Systems is the quality layer for AI-generated code. We audit, rescue, and fix codebases built with every major AI tool.