Vector Databases for RAG in 2026: pgvector, Pinecone, or Something Else
After building RAG pipelines on 4 production apps with three different vector stores, here is the framework I use to decide between pgvector, Pinecone, and the new wave of dedicated vector DBs.
The RAG choice that matters
When you build a RAG (retrieval-augmented generation) pipeline, half a dozen tools live in the stack — embeddings model, chunker, vector store, retriever, reranker, LLM. The vector store is the one most teams overthink and underdecide on.
After shipping four production RAG apps in the last year, the framework I now use is simple. Most teams should start with pgvector and move only if they hit a specific limit.
Why pgvector is the default
Postgres with the pgvector extension is good enough for 90 percent of RAG workloads in 2026.
What it gives you:
What it lacks:
For typical app workloads — under 1 million vectors, queries under 100ms — pgvector wins on operational simplicity by a wide margin.
When to pick a dedicated vector DB
Specific signals that you have outgrown pgvector:
If none of these apply, pgvector is the right call.
The dedicated options in 2026
Pinecone
The mature default. Easy to set up, scales well, integrates with every framework. Costs more than self-hosted but you do not operate it.
Pick when: B2B SaaS at scale, want managed, willing to pay 70 dollars per month at the entry tier.
Qdrant
Open source, self-host or managed. Beautiful API, fast, well-documented. My recommendation if you want to self-host without Postgres overhead.
Pick when: You want vector search as a service but cannot or do not want to pay Pinecone prices.
Weaviate
Older, more feature-rich, includes built-in reranking and hybrid search. Heavier to operate.
Pick when: You need advanced features out of the box and have ops capacity.
Chroma
The newest, most fashionable. Great DX for prototypes. I have not yet seen it in serious production.
Pick when: Prototyping or building local-first apps.
Turbopuffer
Newer (2024). Object storage backed, claims very low cost at massive scale. Interesting if your workload is read-heavy and cost-sensitive.
Pick when: You have very large vector counts (50 million plus) and cost matters more than latency.
LanceDB
Open source, file-based, good for embedded use cases. Different mental model — vectors live alongside your data in a Parquet-like format.
Pick when: Local-first or notebook-heavy workflows.
My pick per project
After 4 production RAG apps:
Pattern: most projects start on pgvector and either stay there or move to Pinecone or Qdrant if they hit a specific limit.
The pgvector setup that just works
For most apps, here is the minimal viable RAG with pgvector:
That is the whole vector layer. Five lines of schema and one query.
Embedding model choice
OpenAI's text-embedding-3-small is the right default. 1536 dimensions, cheap, good quality. Use text-embedding-3-large only when you have specifically measured a recall improvement.
Alternatives:
For 95 percent of RAG, start with OpenAI text-embedding-3-small.
Chunking strategy matters more than the DB
The vector store is the last thing to optimize in a RAG pipeline. Chunk size, overlap, and chunking strategy matter way more.
Defaults I use:
If your retrieval is bad, fix chunking before swapping vector DBs.
Reranking
After retrieving top 20 chunks by vector similarity, rerank them with a cross-encoder before sending to the LLM. This is the single highest-quality improvement in any RAG pipeline.
Default: Cohere's rerank-3 model. Cheap, easy API call, dramatically improves retrieval quality. Worth the extra 200ms per query.
TL;DR
If you are building a RAG pipeline and want a senior to architect it, contact me. AI and RAG work is part of my service offering.
You might also like
Drizzle ORM in Production: Patterns I Use After 6 Client Projects
Real Drizzle patterns from shipping it on 6 production apps in the last year — schema design, type safety, migrations, query patterns, and where Drizzle still falls short of Prisma.
PostgreSQL vs MongoDB for SaaS in 2026: When to Pick Which
After shipping SaaS apps on both Postgres and MongoDB, here's the decision tree I actually use — schema flexibility, JSONB, indexing, costs, and migration pain.
Building Production AI Agents with Claude 4.7 and Tool Use
What I learned shipping AI agents to production: tool design, prompt structure, durable execution, observability, and cost control. Practical patterns from real client work.