Amitav Roy

Beyond pgvector: Choosing the Right Vector Database for Production


Published on: 14th Nov, 2025 by Amitav Roy
pgvector struggles at scale. Deep dive into vectors, indexes (IVFFlat vs HNSW), and why specialized databases (OpenSearch, Typesense, Pinecone) solve production problems pgvector can't.

When Vector Search Becomes Your Production Nightmare: A Deep Dive into Vectors, Indexes, and the Databases That Handle Them

I was scrolling through my feed last week when I stumbled upon a thread that stopped me cold: a developer sharing their war story about how pgvector brought their production database to its knees. Their team had started small, vector search worked beautifully, and then—as these stories always go—scale happened. What began as sub-100ms queries turned into multi-second timeouts. Their Postgres instance was consuming memory like it was going out of style. Indexes that took minutes to build were now taking hours.

The thread resonated because I've been there. Not with vectors specifically, but with that moment when a technology that worked flawlessly in development becomes a liability in production. So I did what any developer with 16 years of battle scars would do: I went down the rabbit hole. What I discovered wasn't just about pgvector's limitations—it was about understanding vectors, indexes, and why specialized databases exist in the first place.

What Are Vectors, Really?

Before we talk about search problems, we need to understand what we're searching. A vector, in the context of modern AI applications, is essentially a list of numbers that represents meaning. When you embed a sentence, an image, or a document, you're converting it into this numerical representation—typically hundreds or thousands of dimensions.

Think of it like a city's location. You can describe New York City with two numbers: latitude and longitude. Those two numbers capture its position in space. Vectors work similarly, except instead of two dimensions, we're working with 384, 768, or even 1536 dimensions. And instead of representing physical location, these dimensions capture semantic meaning.

Here's where it gets interesting: similar concepts end up with similar vectors. The sentence "The cat sat on the mat" and "A feline rested on the rug" will have vectors that are close together in this high-dimensional space. That proximity is what makes vector search powerful—we're literally measuring semantic similarity mathematically.

Vector Indexing: Why You Can't Just Loop Through Everything

Now, imagine you have a million documents, each represented by a 768-dimensional vector. A user searches for something, and you need to find the most similar documents. The naive approach? Calculate the distance from the search vector to every single one of those million vectors and pick the closest ones.

This brute force approach faces a fundamental problem: you're not comparing simple values like prices or dates. You're comparing hundreds of dimensions. Do the math: one million vectors × 768 dimensions × distance calculations = your database crying for mercy.

This is why vector indexes exist. They're data structures designed to make "find similar vectors" operations fast by avoiding the need to check every single vector. Instead of comparing against a million vectors, a good index might only compare against a few hundred or thousand candidates.

If you've worked with traditional database indexes, the concept is familiar. A B-tree index on a user_id column doesn't scan every row—it navigates a tree structure to quickly narrow down candidates. Vector indexes work on the same principle, just optimized for multi-dimensional similarity rather than exact equality or range queries.

How Vector Search Actually Works

At a high level, vector search follows a pattern you'd recognize from other search problems:

  1. Query Encoding: Your search query (text, image, whatever) gets converted into a vector using the same embedding model that created your stored vectors.

  2. Index Navigation: The search algorithm uses your index structure to quickly narrow down candidates. This is the magic step—where your index determines whether you're checking 1,000 candidates or 1,000,000.

  3. Distance Calculation: For the candidate vectors, calculate actual distance (cosine similarity, euclidean distance, or dot product, depending on your use case).

  4. Result Ranking: Return the top K closest vectors, which correspond to your most similar documents, images, or whatever you're searching.

The critical insight: step 2 is where performance lives or dies. A naive linear scan makes step 2 trivial (no index needed) but makes step 3 computationally prohibitive. A good index makes step 2 slightly more complex but reduces the candidates in step 3 by orders of magnitude.

When Scale Turns Your Solution Into a Problem

Here's where that thread I read becomes relevant. The developer's story followed a predictable arc:

Month 1: 10,000 vectors, pgvector works great. Queries are fast. Index builds in seconds. Team is happy.

Month 6: 500,000 vectors. Queries are slower but acceptable. Index builds take a few minutes. Memory usage is climbing but manageable.

Month 12: 5 million vectors. Query latency is inconsistent—sometimes 50ms, sometimes 5 seconds. Index builds take hours and occasionally kill the database. Memory spikes during index creation force restarts. Production alerts are constant.

Sound familiar? This is the classic scaling curve that every technology hits. The question isn't whether you'll hit scaling problems—it's when, and whether your chosen solution can grow with you.

The pgvector Reality Check: When Good Enough Stops Being Good Enough

Let me be clear: pgvector is a remarkable achievement. Adding vector search capabilities to Postgres—the database you already know, already trust, already have in production—is genuinely valuable. For many applications, especially in early stages, pgvector is exactly the right choice.

But that thread I read wasn't wrong. As your vector count grows, pgvector's architectural trade-offs start to show. The challenge isn't that pgvector is poorly designed; it's that Postgres wasn't architected with vectors as a first-class citizen. You're bolting vector search onto a relational database, and at scale, those bolts start to creak.

pgvector gives you two indexing strategies, and understanding their trade-offs is crucial:

IVFFlat: The Cluster-Based Approach

IVFFlat (Inverted File with Flat quantization) works by partitioning your vector space into clusters. Think of it like organizing books in a library by genre. When someone asks for a mystery novel, you don't search every shelf—you go to the mystery section and search there.

Here's how it works: during index creation, IVFFlat analyzes your vectors and groups them into clusters (you specify how many). Each vector gets assigned to its nearest cluster. During search, the algorithm identifies which clusters are most likely to contain similar vectors and only searches within those clusters.

The Good:

The memory footprint during index creation is reasonable. You're not building complex graph structures; you're just calculating cluster centers and assignments. This makes IVFFlat viable on database instances that aren't overprovisioned.

Index creation is faster than alternatives like HNSW. For a million vectors, you might wait minutes instead of hours. In development environments or when you need to rebuild indexes frequently, this speed matters.

Query performance for many use cases is acceptable. If you configure your clusters appropriately and your queries don't require perfect recall, IVFFlat delivers decent results at good speed.

The Bad

The elephant in the room: you must specify the number of clusters upfront. Too few clusters, and each cluster becomes large, defeating the purpose of clustering. Too many clusters, and you need to search more clusters to maintain recall, again defeating the purpose.

That number—the cluster count—significantly impacts everything. It affects query speed (fewer clusters to search = faster, but less accurate). It affects recall (how many relevant results you actually find versus miss). It affects memory usage. And here's the kicker: there's no magic formula for the right number. It depends on your data distribution, your query patterns, your accuracy requirements.

The real production problem: new vectors get assigned to existing clusters, but those clusters don't rebalance. Imagine you built your index with 1,000 clusters when you had 100,000 vectors. Now you have 2 million vectors. Those 1,000 clusters are far larger than they were designed for. The distribution is probably skewed—some clusters are massive, others relatively small. Your only solution? Rebuild the entire index with more clusters. That hours-long rebuild I mentioned? This is where it comes from.

HNSW: The Graph-Based Alternative

HNSW (Hierarchical Navigable Small World) takes a completely different approach. Instead of clustering, it builds a multi-layer graph structure. Think of it like a highway system: you have local roads (bottom layer) that connect everything, highways (middle layers) that skip over long distances, and interstates (top layers) that get you across the entire space quickly.

During search, HNSW starts at the top layer (the interstate) to get close to your destination quickly, then navigates down through progressively lower layers (highways, then local roads) until it reaches your target in the bottom layer where all vectors exist.

The Good

Recall is better than IVFFlat for most datasets. The graph structure captures relationships more accurately than clustering, meaning you're less likely to miss relevant results because they happened to be in a cluster you didn't search.

Query performance is more consistent. With IVFFlat, query time varies significantly based on how well your clusters match your queries. HNSW's graph traversal is more predictable—you're navigating a fixed structure regardless of the specific query.

It scales better to larger datasets. As your vector count grows, HNSW's hierarchical structure continues to provide efficient navigation paths. You're not hitting the "clusters too large" problem.

The Bad

Memory requirements during index builds are significantly higher. You're not just storing vectors and cluster assignments; you're building a complex graph with multiple layers of connections. Each vector maintains links to multiple neighbors at multiple layers. This graph structure resides in memory during construction.

Index creation is slow. Painfully slow. Building an HNSW index for millions of vectors can take hours or even days. Each new vector must find its place in the existing graph structure, which means traversing the graph, identifying neighbors, and updating connections. As the graph grows, each insertion becomes more expensive.

The memory requirements aren't theoretical; they're real, and they'll take down your database if you're not careful. That developer in the thread I read? They discovered this the hard way when their database instance ran out of memory mid-index-build and crashed. Their application stopped serving queries. Their alarms went off. Their CTO asked uncomfortable questions.

Why Specialized Vector Databases Exist

This is the moment in the story where you realize that maybe, just maybe, using a general-purpose relational database for vector search at scale isn't the optimal solution. It's like trying to deliver packages using a sports car—it works for a few deliveries, but eventually you need a truck.

Specialized vector databases exist because vectors have unique characteristics that general-purpose databases weren't designed to handle. They need different index structures, different memory management, different query patterns, and different scaling strategies.

Let's look at three popular solutions and what makes them different.

OpenSearch: The Swiss Army Knife

OpenSearch (the AWS-backed fork of Elasticsearch) took an interesting approach: instead of building a single vector index implementation, they integrated with multiple specialized libraries. You can choose your algorithm based on your specific needs:

Lucene: OpenSearch's built-in HNSW implementation. If you're already using OpenSearch for text search and log aggregation, adding vector search with Lucene's HNSW is straightforward. It's battle-tested, integrated directly into the search engine, and requires no external dependencies.

Faiss: Facebook AI Similarity Search. Faiss is Facebook's open-source library for efficient similarity search. It includes multiple index types, from simple flat indexes (for when you need perfect recall and have reasonable dataset sizes) to sophisticated quantization techniques that trade recall for massive memory savings. Using Faiss with OpenSearch gives you access to Facebook's years of research into production vector search.

Nmslib: Non-Metric Space Library. Nmslib specializes in non-metric spaces—situations where traditional distance metrics don't apply or perform poorly. It also provides HNSW implementations optimized for specific distance metrics.

Why Choose OpenSearch

You need comprehensive search capabilities beyond vectors. Maybe you're searching documents that have both text and vector components—like finding articles that mention "climate change" AND are semantically similar to a reference document.

You require AWS integration. If you're already in the AWS ecosystem, OpenSearch integrates seamlessly with other AWS services—CloudWatch for monitoring, IAM for authentication, VPC for networking.

You want to consolidate multiple search use cases. Instead of running separate systems for text search, log aggregation, and vector search, OpenSearch handles all three. This reduces operational complexity and infrastructure costs.

The trade-off: OpenSearch is complex. It's a powerful system, but that power comes with operational overhead. You're managing clusters, shards, replicas. You're tuning heap sizes and garbage collection. You're monitoring multiple metrics across multiple nodes.

Typesense: The Developer Experience Champion

Typesense is the solution you choose when you want simplicity without sacrificing capability. It's an open-source search engine that prioritizes developer experience—simple APIs, clear documentation, and sensible defaults.

For vector search, Typesense uses HNSW, but here's what makes it different from trying to run HNSW in Postgres: Typesense was architected with vectors in mind. The memory management, query optimization, and index building are all designed for vector operations as first-class citizens.

Why Choose Typesense

You're building an application that needs both traditional text search and vector capabilities. Maybe you're building a product search where you want typo tolerance, faceting, and semantic similarity. Typesense excels at this hybrid use case.

You value open-source flexibility. You can self-host Typesense on any infrastructure, modify it if needed, and avoid vendor lock-in. You're not dependent on a proprietary service that might change pricing or terms.

You need a simple developer experience. Typesense's API is intuitive. Its documentation is clear. Its defaults are sensible. You're not spending weeks tuning parameters or deciphering error messages.

The consideration: Typesense is still relatively young compared to solutions like OpenSearch. The community is smaller, the ecosystem of integrations is more limited, and you might hit edge cases that aren't well-documented.

Pinecone: The Vector-First Philosophy

Pinecone represents a different philosophy entirely: what if we built a database from the ground up purely for vectors? No text search legacy. No relational table compatibility. Just vectors, optimized relentlessly.

Pinecone offers two architectures, and the difference is fascinating:

Pod-based: This is the traditional approach—you provision pods (clusters) of specific sizes with fixed resources. You're essentially renting dedicated vector search infrastructure. It's predictable, performant, and you control the resources.

Serverless: This is where Pinecone gets interesting. Serverless architecture separates storage from compute. Your vectors are stored in a distributed storage layer. Compute resources scale dynamically based on query load. You pay only for what you use, and you never worry about capacity planning.

Think about what serverless means for vectors: no more index rebuilds that lock your database. No more capacity planning for peak traffic. No more worrying whether your infrastructure can handle a sudden spike in searches.

Why Choose Pinecone

You're building AI-first applications requiring pure vector search at scale. Your core functionality is semantic search, recommendation engines, or similarity matching. You don't need traditional text search features—you need vectors done right.

You need predictable low latency. Pinecone optimizes for consistent performance. You're not seeing the variable query times that plague less specialized solutions. Your P99 latency matters as much as your P50.

You want zero infrastructure management. No clusters to configure, no shards to rebalance, no indexes to rebuild manually. Pinecone handles the operations, letting you focus on building features.

You require advanced multi-tenancy. If you're building a SaaS application where each customer needs isolated vector search, Pinecone provides namespace-level isolation without operational complexity.

The investment: Pinecone is a managed service, and that convenience has a price. You're paying for someone else to solve the operational problems. For many teams, especially smaller ones, that trade-off makes perfect sense. For others with strong DevOps capabilities and cost sensitivity, self-hosting alternatives might be more attractive.

How to Actually Choose: A Decision Framework

After diving deep into these solutions, here's the mental model I've developed for choosing:

Start with the query pattern: What are you searching? Pure semantic similarity? Hybrid text + vector? Metadata filtering combined with vectors? Your query pattern eliminates entire categories of solutions.

Consider your operational reality: Do you have a team that can run and tune complex distributed systems? Or are you three developers shipping fast? Your operational capacity determines how much complexity you can handle.

Think about scale—honestly: That's where the thread I read got it right. The developer knew they were growing fast but hoped pgvector would scale. Hope is not a strategy. If you're expecting rapid growth, choose the solution that handles that growth gracefully, even if it's more complex initially.

Calculate the total cost: A managed service like Pinecone has clear pricing. Self-hosting has server costs, but also engineering time for operations, debugging, optimization. What's your engineering team's time worth? Sometimes the "expensive" managed solution is actually cheaper.

Value developer velocity appropriately: How quickly can your team ship features with each solution? Sometimes a simpler solution that lets you move fast is worth more than a theoretically optimal solution that takes weeks to get right.

The Production Lesson

That thread I read ended with a lesson every experienced developer knows but sometimes needs to relearn: the technology that gets you to market isn't always the technology that gets you to scale. pgvector was the right choice for that team at 10,000 vectors. It stopped being the right choice at 5 million.

The smart move isn't avoiding tools with scaling limits—it's knowing what those limits are and planning for the migration before you hit them. If you're starting with pgvector, great! But have a conversation now about what happens when you hit 100,000 vectors, or a million. What's your migration path? How will you test it? What's the timeline?

Vector search isn't going away. If anything, it's becoming more central to modern applications. Semantic search, recommendation systems, RAG applications, similarity matching—these use cases are multiplying. Understanding how vectors work, how indexes function, and which databases excel at vector operations isn't optional knowledge anymore. It's core infrastructure understanding.

The craft of building production systems means knowing not just what works today, but what will work tomorrow at 10x scale. Vector search is just another domain where that principle applies. Choose wisely, plan for growth, and maybe, just maybe, you'll avoid being the next cautionary tale in someone's thread.

Final Thoughts

All three solutions—OpenSearch, Typesense, and Pinecone—handle vector indexing efficiently. They've solved the fundamental problems that make pgvector struggle at scale. But they optimize for different use cases.

OpenSearch provides the broadest feature set—if you need every possible search capability in one system, it's your choice.

Typesense offers the best developer experience for hybrid workloads—when you need vector search to feel as simple as adding a new field to your search index.

Pinecone delivers specialized performance for vector-centric AI applications—when vectors aren't a feature; they're the foundation.

The right choice depends on your specific context: your team, your use case, your scale, and your operational capacity. But now, at least, you're making an informed choice instead of discovering limitations the hard way in production.

And that's worth all the research time it took to get here.