Variant Systems

March 15, 2026 · Variant Systems

Is Your AI-Built App Production Ready? The Checklist

A brutally honest checklist for founders who built with AI coding tools. 7 categories, 30+ checks, and the specific failures we see in every audit.

ai-generated-code code-audit vibe-coding checklist startup security

You built something with Cursor, Claude Code, Bolt, Lovable, or Copilot. It works. The demo is clean. Users are signing up. Maybe you’re about to pitch investors, onboard your first enterprise customer, or launch publicly.

The question nobody around you is asking: is the thing actually ready for production?

Not “does it work.” It works. The question is whether it will keep working when real users do real things to it — including things you didn’t anticipate. Whether it’s secure enough that a bored teenager with Burp Suite won’t ruin your weekend. Whether it will survive its first incident without losing data.

We audit AI-built codebases regularly. The patterns are consistent enough that we can give you the checklist. Go through it honestly. Every unchecked box is a risk you’re carrying into production.

1. Security

This is where AI-generated code fails most consistently. Across five recent Claude Code audits, 4 out of 5 had hardcoded secrets in source and 3 out of 5 had injectable queries. These aren’t edge cases. They’re the norm.

  • No secrets in source code. Check your entire git history, not just the current files. git log -p | grep -i "sk_live\|api_key\|password\|secret" will ruin your day, but better now than after a breach. AI tools frequently hardcode the API key you pasted into the prompt.
  • Auth middleware on every route. Not just the ones you built first — the ones you added last week too. AI tools apply auth correctly to the initial batch of endpoints, then silently skip it on later additions because the prompt didn’t mention it.
  • Input validation beyond TypeScript types. Types don’t exist at runtime. If your API accepts a string for an email field, someone will send you <script>alert(1)</script> and your app will happily store it. Validate length, format, and allowed characters on every user input.
  • No SQL/NoSQL injection vectors. AI tools use ORMs correctly for simple queries but drop to raw string interpolation for complex ones. Search your codebase for raw queries and verify every variable is parameterized.
  • Security headers configured. CORS, CSP, HSTS, X-Frame-Options, X-Content-Type-Options. AI-generated code almost never sets these. 3 out of 5 projects in our audits were missing them entirely.
  • Rate limiting on public endpoints. Login, signup, password reset, and any endpoint that sends emails or costs you money. Without rate limiting, one script kiddie can drain your SendGrid quota or brute-force your auth in an afternoon.

The AI pattern: AI code handles the happy path of security — it sets up auth, it uses an ORM. But it doesn’t think adversarially. It doesn’t ask “what if someone sends 10,000 requests per second?” or “what if the user modifies the request body?” Security requires imagining what you don’t want to happen. AI only generates what you asked for.

2. Data integrity

Your app is only as valuable as the data inside it. Lose it once and you lose your users permanently.

  • Database migrations strategy exists. Not “I modify the schema directly in production.” A versioned, reproducible migration system that you can run forward and backward. AI tools often skip migrations entirely and just define schemas inline.
  • Backup and restore tested. Not “backups are enabled.” Actually restore from a backup into a clean environment and verify the data is intact. If you haven’t tested restore, you don’t have backups — you have a false sense of security.
  • Multi-tenant data isolation. If your app serves multiple organizations, every single query must be scoped to the current tenant. AI tools routinely generate queries that fetch all records and filter client-side, or skip the tenant filter on admin/reporting endpoints. One missing WHERE clause and Customer A sees Customer B’s data.
  • Cascading deletes handled correctly. When you delete a user, what happens to their posts, comments, files, and billing records? AI-generated code often doesn’t define cascade rules, which means you either get orphaned data cluttering your database or unexpected deletions wiping out things you wanted to keep.
  • No sensitive data exposed via API. Check every API response. Are you sending password hashes, internal IDs, other users’ emails, or billing details to the frontend? AI tools return the entire database object by default. They don’t think about what the client should and shouldn’t see.

The AI pattern: Two of our five audited projects had data models that needed significant rework. AI tools design schemas for the feature you’re building right now. They don’t consider how the data model needs to evolve, how tenants are isolated, or what happens when records are deleted. These are system-level concerns that require upfront design — exactly the thing prompt-by-prompt development skips.

3. Error handling and observability

Every single project in our audit findings had zero error monitoring and zero structured logging. Every one. This is the most consistent failure in AI-generated code.

  • Error monitoring configured. Sentry, Highlight, Datadog, whatever. When your app throws an error in production, you need to know about it before your users tell you on Twitter. This takes 15 minutes to set up. There is no excuse.
  • Structured logging, not console.log. console.log("something went wrong") tells you nothing. Structured logs with timestamps, request IDs, user context, and error details tell you everything. When you’re debugging a production issue at 2am, the difference is existential.
  • Health check endpoints. A simple /health endpoint that verifies your app can reach the database, cache, and any critical external services. Your load balancer needs this. Your monitoring needs this. Your on-call engineer needs this.
  • Graceful degradation. What happens when your payment provider is down? What happens when Redis is unreachable? AI code typically crashes or hangs. Production code degrades gracefully — showing cached data, queuing retries, or displaying a meaningful error message instead of a blank screen.
  • No stack traces in API responses. Check what your API returns when something throws. If users see TypeError: Cannot read property 'id' of undefined with a full stack trace, you’re leaking implementation details and giving attackers a roadmap.

The AI pattern: AI code handles errors syntactically. It writes try/catch blocks. It returns error responses. But it doesn’t do anything useful with the error — no logging, no alerting, no context. The code looks like it handles errors because the syntax is there. It doesn’t, because the infrastructure isn’t.

4. Testing

4 out of 5 audited projects had test files that don’t actually test anything meaningful. AI is great at writing tests that pass. It’s bad at writing tests that catch bugs.

  • Tests have real assertions. Not expect(result).toBeDefined(). That will literally never fail unless the function throws. Test that the return value is correct, not just that a return value exists.
// What AI generates -- this always passes
test("creates order", async () => {
  const order = await createOrder(orderData);
  expect(order).toBeDefined();
  expect(order.items).toBeDefined();
});

// What actually catches bugs
test("creates order with correct total and status", async () => {
  const order = await createOrder({
    items: [{ sku: "WIDGET-1", qty: 2, price: 29.99 }],
    userId: testUser.id,
  });
  expect(order.total).toBe(59.98);
  expect(order.status).toBe("pending");
  expect(order.items).toHaveLength(1);
  expect(order.userId).toBe(testUser.id);
});
  • Integration tests on critical paths. Unit tests are fine for utilities. But your signup flow, payment processing, and core business logic need integration tests that hit the database and verify the full chain works. If your test suite mocks everything, it’s testing your mocks, not your code.
  • Auth and authorization tested explicitly. Can a regular user access admin endpoints? Can User A modify User B’s resources? These are the tests that AI almost never writes, and they’re the ones that catch the bugs that get you in the news.
  • Edge cases covered. Empty inputs, maximum-length strings, special characters, concurrent requests, expired tokens, network timeouts. AI tests the path that works. Production runs on the paths that don’t.

The AI pattern: AI-generated tests achieve decent coverage numbers while testing almost nothing. They’re structural mimicry — they look like tests, they run like tests, but they assert nothing meaningful. We regularly see 70%+ code coverage with zero tests that would catch a real regression.

5. Performance

Your app works fine with 50 users. It will not work fine with 5,000 unless you check these.

  • No N+1 queries. This is the single most common performance problem in AI-generated code. Fetching a list of 100 items and making a separate database query for each item’s related data. It works in development. It melts your database in production. Check your ORM queries with a query logger and look for repeated patterns.
  • Pagination on all list endpoints. If any endpoint returns an unbounded array of results, it will eventually return 50,000 results and either crash the client, timeout, or cost you an absurd amount in bandwidth. Every list endpoint needs a limit and offset (or cursor).
  • Connection pooling configured. AI code typically creates a new database connection per request or uses the ORM’s defaults without checking them. Under load, you’ll exhaust your database’s connection limit and everything stops. Verify your pool size is appropriate for your expected concurrency.
  • No synchronous blocking on expensive operations. Sending emails, processing images, generating PDFs, calling external APIs — none of these should block the HTTP response. Use a background job queue. AI code does everything synchronously because it’s simpler to generate.

The AI pattern: AI code is optimized for correctness, not performance. It generates the most straightforward solution, which is often the least efficient one. N+1 queries, unbounded fetches, synchronous processing — all of these are correct implementations that will fall over under real load.

6. Deployment

A surprising number of AI-built apps have no deployment pipeline at all. They’re deployed by running git push and hoping.

  • CI/CD pipeline exists. Automated build, test, and deploy. Not “I run the deploy script from my laptop.” If your deployment depends on one person’s local environment, you don’t have a deployment process — you have a single point of failure.
  • Environment separation. Development, staging, and production are different environments with different databases, different API keys, and different configurations. If your staging environment points at your production database, you will eventually delete production data by accident.
  • Rollback capability. When a deploy breaks production — and it will — can you revert to the previous version in under five minutes? If the answer involves “restore from backup” or “revert the commit and redeploy,” your rollback process is too slow.
  • No hardcoded environment values. Search your codebase for hardcoded URLs, database strings, API endpoints, and port numbers. Every environment-specific value should come from environment variables. AI tools hardcode whatever value was in the prompt.
  • Docker or equivalent reproducible builds. “It works on my machine” is not a deployment strategy. Your build should be reproducible in any environment from a clean checkout.

The AI pattern: AI tools generate application code, not infrastructure code. They build the app but not the system that runs it. Deployment, monitoring, and operations are afterthoughts — if they’re thoughts at all.

This isn’t engineering, but it will block your launch or your fundraise if you skip it.

  • Privacy policy exists and matches reality. What data do you collect? Where do you store it? Who do you share it with? Your privacy policy needs to accurately describe your actual data practices. If you’re using analytics, third-party APIs, or AI services that process user data, those need to be disclosed.
  • Data deletion capability. Can you actually delete a user’s data when they ask? Not soft-delete — actually remove it from your database, backups, logs, and any third-party services you’ve sent it to. GDPR and CCPA require this. AI-generated code almost never implements it.
  • Cookie consent if applicable. If you serve EU users and use non-essential cookies (analytics, tracking), you need consent. This isn’t optional. The fines are real.
  • Terms of service. Especially important if you’re using AI services under the hood. Some AI provider terms have restrictions on how their output can be used in your product. Read them.

The AI pattern: AI tools don’t think about legal requirements. They build features. Whether those features comply with privacy regulations, handle data deletion requests, or respect consent requirements is entirely on you.

Scoring yourself

Count your checked boxes out of 30. Here’s roughly where you stand:

ScoreAssessment
25-30Genuinely production ready. Ship it.
18-24Fixable in 1-2 weeks of focused work. Prioritize security and data integrity.
10-17Significant gaps. Do not launch to paying customers or demo to investors until these are addressed.
Below 10You have a prototype, not a product. That’s fine — but know the difference before you make commitments.

Most AI-built apps we audit score between 8 and 15. That’s not a condemnation of AI tools. It’s a reflection of what they optimize for: features that work, not systems that are resilient. The gap between “works” and “production ready” is where vibe code cleanup lives.

What to do next

If you scored below 18, the fastest path is to prioritize in this order:

  1. Security — because a breach is existential
  2. Error monitoring — because you can’t fix what you can’t see
  3. Data integrity — because data loss is irreversible
  4. Everything else — because it matters but won’t kill you tomorrow

If you want a professional assessment rather than a self-check, start here. We’ll tell you where the real risks are, what to fix first, and how long it takes. No sales pitch — just a prioritized list of what’s actually wrong and what it costs to fix.

For a deeper look at the specific patterns AI tools produce and why they fail these checks, read our Claude Code audit findings. The patterns apply across all AI coding tools, not just Claude Code.