Variant Systems

PostgreSQL Vibe Code Cleanup

Your AI created tables without indexes, migrations that conflict, and queries that scan every row. Time for a real schema.

At Variant Systems, we pair the right technology with the right approach to ship products that work.

Why this combination

  • AI generates schemas without indexes on foreign keys or common query columns
  • N+1 queries from ORM misuse multiply database load with every new row
  • Migrations conflict, skip steps, or don't match the actual production schema
  • No connection pooling configuration leads to exhaustion under load

Query Plans AI Tools Never Consider

AI generates schemas that work for demos and break in production. The most common problem: no indexes. AI creates tables with foreign keys but doesn’t add indexes on those columns. It creates columns used in WHERE clauses and ORDER BY sorts but leaves them unindexed. With 1,000 rows, nobody notices. With 100,000 rows, every query that filtered by user_id is doing a sequential scan across the entire table.

N+1 queries are the second plague. AI generates ORM code that fetches a list of records, then loops through them and fetches related records one at a time. Load a page with 50 orders and the app fires 51 queries - one for the list and one for each order’s items. The ORM makes this look clean in code. The database sees 51 round trips where one join would suffice.

Migrations from AI are a mess. Each prompt generates a migration in isolation. One adds a column. The next adds the same column with a different type. A third drops a table that the second depends on. Migrations that were applied manually in production don’t match the migration files. Foreign key constraints reference tables that were renamed. The migration history is fiction.

Connection management is ignored entirely. AI uses the ORM’s default connection settings - usually one connection per request with no pooling. Under load, the application opens hundreds of connections. PostgreSQL defaults to 100 max connections. Your application exhausts the pool and starts returning connection errors while the database sits at 10% CPU.

Profiling Real Workloads with pg_stat_statements

We start with pg_stat_statements to understand the real query workload. Which queries run most frequently? Which take the longest? Which scan the most rows? We rank queries by total time (frequency multiplied by duration) to focus on what impacts users most. Not every slow query matters - a nightly report that takes 30 seconds is fine. A user-facing query that takes 500ms is not.

Index strategy comes from the data. We run EXPLAIN ANALYZE on the top queries to see what the planner does. Sequential scans on indexed-column candidates get indexes. Queries that filter on multiple columns get composite indexes with the right column order. We verify every index actually gets used with pg_stat_user_indexes - unused indexes waste write performance and storage.

N+1 queries get resolved at the ORM level. Eager loading replaces lazy loading for relationships that are always accessed together. Queries that loop get replaced with joins or subqueries. We add query logging in development that flags any request generating more than 10 queries. The goal is one to three queries per API endpoint, not fifty.

Migration history gets reconciled. We compare the migration-defined schema against the actual production schema using pg_dump --schema-only. Every difference is documented. We create a reconciliation migration that brings the files in sync with reality. Going forward, migrations are tested against a production-like database in CI before they’re applied.

From Sequential Scans to Index-Only Lookups

Before: An API where the orders endpoint takes 3.2 seconds because it fires 200 queries for 200 orders. A users table with 500,000 rows and no index on email, the column used in every login query. Twelve migration files that don’t match production. Connection errors during peak traffic because the pool is exhausted.

After: The orders endpoint returns in 45ms with a single query using proper joins. The users table login query uses an index scan and completes in 0.3ms. Migration files match production exactly. PgBouncer handles connection pooling with 20 server connections serving 500 application connections.

The infrastructure cost often drops too. Before cleanup, teams scale up database hardware to handle inefficient queries. After, the same queries use a fraction of the CPU and I/O. We’ve seen teams downsize their database instance after cleanup because the optimized queries don’t need the extra resources.

CI Pipelines That Catch Slow Queries Before Production

We add pg_stat_statements monitoring with alerting. Queries that exceed a duration threshold or scan more than a configurable number of rows trigger alerts. New N+1 patterns get caught within hours of deployment, not months later when the table has grown.

CI runs EXPLAIN ANALYZE on new queries against a database loaded with production-scale data volumes. A query plan that does a sequential scan on a table with more than 10,000 rows fails the pipeline. Developers see the problem before code review, not in production monitoring.

Migration linting catches common mistakes. Adding a column with a default value on a large table triggers a warning - it locks the table. Dropping a column without first removing code references triggers a warning. Renaming a table without updating foreign keys triggers an error. These rules are codified in CI, not left to human memory.

We document the schema with database-level comments on every table and critical column. COMMENT ON TABLE orders IS 'Customer purchase orders, partitioned by created_at'. These comments appear in \dt+ output, in pgAdmin, and in auto-generated documentation. The next developer - or the next AI prompt - has context about what each table is for and why it’s structured the way it is.

What you get

Schema audit with index recommendations based on actual query patterns
N+1 query detection and resolution with proper eager loading
Migration cleanup and reconciliation with production schema
Connection pooling setup with PgBouncer or application-level pooling
Query performance monitoring with pg_stat_statements configuration

Ideal for

  • Applications with AI-generated schemas that slow down as data grows
  • Products where API response times are dominated by database queries
  • Teams with migration files that don't match what's actually in production
  • Founders hitting database connection limits during traffic spikes

Other technologies

Industries

Ready to build?

Tell us about your project and we'll figure out how we can help.

Get in touch