Variant Systems

pgvector for Legal Tech

Legal search needs to understand meaning, not just match keywords. pgvector makes that possible inside PostgreSQL.

Variant Systems builds industry-specific software with the tools that fit the problem.

Why this combination

  • Semantic search finds relevant case law even when terminology differs across jurisdictions
  • Vector embeddings live alongside structured case data in the same database
  • RAG pipelines ground LLM responses in actual case law and firm documents
  • No separate vector database to manage - pgvector extends your existing PostgreSQL

Legal search has a vocabulary problem. A California court might call it “breach of fiduciary duty” while a Delaware opinion uses “violation of duty of loyalty.” A keyword search for one misses the other. Lawyers know these concepts are related. Traditional search infrastructure doesn’t.

pgvector stores vector embeddings alongside your structured legal data in PostgreSQL. Documents are embedded using models that understand legal language. Similarity search finds conceptually related content regardless of exact wording. And because it’s a PostgreSQL extension, you don’t need a separate vector database. Your case data, document metadata, and semantic search index all live in one place with one security model.

Semantic Case Law Research

Associates spend hours searching for relevant precedent. They try different keyword combinations, scan dozens of results, and read full opinions to determine relevance. Most of this effort is wasted on irrelevant hits. The right case exists in the database but uses different terms than the search query.

We build semantic search interfaces powered by pgvector. A researcher describes the legal issue in natural language. The query gets embedded and compared against pre-computed embeddings of case law summaries and holdings. Results rank by conceptual similarity, not keyword overlap. A search about “employer retaliation after whistleblowing” surfaces relevant cases even when they use phrases like “adverse employment action following protected disclosure.” Research that took hours now takes minutes.

Contract Clause Matching

Due diligence on a deal might involve reviewing hundreds of contracts for specific provisions. Find every change-of-control clause. Identify all indemnification caps. Flag any non-compete provisions with unusual terms. Doing this manually means a team of associates reading every contract page by page.

pgvector makes clause-level search practical. We embed individual clauses from each contract and store them with references back to the source document and location. A query embedding for “change of control” finds all similar clauses across the contract set - even when they’re titled differently or use non-standard language. Results include the source document, page number, and similarity score. You can also threshold similarity scores to flag outlier clauses that deviate significantly from standard language, surfacing provisions that warrant closer attorney review. Associates review a ranked list of matches instead of reading every contract end to end.

Large language models are impressive but unreliable for legal work. They hallucinate citations, invent case holdings, and present fabricated precedent with confidence. Lawyers can’t use tools that make things up. But grounding an LLM’s responses in actual documents from your firm’s knowledge base changes the equation.

We implement retrieval-augmented generation using pgvector as the retrieval layer. When a lawyer asks a question, the system embeds the query, retrieves the most relevant documents from the firm’s database, and passes them as context to the LLM. The response cites actual documents with real page numbers. If the answer isn’t in the retrieved context, the system says so instead of fabricating one. pgvector’s row-level security ensures the retrieval respects matter-level access controls - a lawyer only gets results from documents they’re authorized to see.

Compliance considerations

Attorney-client privilege preserved by keeping embeddings in the same secured database as source documents
No data leaves your infrastructure - embeddings are generated and stored locally
Row-level security on vector tables enforces matter-level access control on search results
Audit logging on similarity searches tracks who queried what and when

Common patterns we build

  • Semantic case law search matching fact patterns to relevant precedent
  • Contract clause matching finding similar provisions across document sets
  • RAG-powered legal research assistants grounded in firm knowledge bases
  • Deposition transcript search finding relevant testimony across thousands of pages

Other technologies

Building in Legal Tech?

We understand the unique challenges. Let's talk about your project.

Get in touch