Version: 2.31-unstable

Choosing a Document Store

Whether you are developing a chatbot, a RAG system, or an image captioner, at some point, it's likely for your AI application to compare the input it gets with the information it already knows.

Haystack currently has integrations with seven categories of Document Stores:

Vector Databases — purpose-built for embedding search and semantic retrieval
Search Engines — full-text search engines extended with vector (kNN) capabilities
Relational Databases — SQL databases with vector search via plugins or extensions
Document / NoSQL Databases — flexible document stores with vector search added on top
In-memory Key-Value Stores — ultra-low-latency stores with HNSW vector search
Vector Index Libraries — lightweight in-process vector similarity search, no external service
Multi-model Databases — single engine supporting graph, document, and vector data models

Here is an overview of all the integrations currently available, grouped by category:

DocumentStore Integrations Available in Haystack

Haystack integrations come in two tiers. Core integrations are built and maintained by the Haystack team — they are tested against every release, follow the same API conventions, and come with full documentation and support. External integrations are contributed and maintained by the community; they extend Haystack's reach but are not covered by the core release cycle.

The tables below list every available integration alongside the key properties you need to choose the right one for your use case: the underlying Engine Type gives you a sense of what the database is optimised for; Open Source tells you whether you can self-host it; Async Support matters for high-throughput pipelines; and Retrievers shows which retrieval strategies are available out of the box — BM25 for keyword search, Embedding for semantic search, Hybrid for both, and specialised options such as Sparse Embedding or SQL where supported.

Core integrations

Integration	Category	Engine Type	Open Source	Async Support	Retrievers
ArcadeDB	Multi-model Database	Multi-model database (graph, document, key-value) with HNSW vector search via HTTP/JSON API	Yes	No	Embedding
AlloyDB	Relational Database	Managed PostgreSQL-compatible database (Google Cloud) with pgvector extension	No	Yes	Embedding, Keyword
Astra	Document / NoSQL Database	Cloud-native managed NoSQL (Apache Cassandra-based) with vector search via DataStax JSON API	No	No	Embedding
Azure AI Search	Search Engine	Managed cloud search service (Microsoft Azure AI Search) with HNSW vector search	No	No	BM25, Embedding, Hybrid
Chroma	Vector Database	Purpose-built vector database	Yes	Yes	Embedding
Elasticsearch	Search Engine	Distributed search & analytics engine with BM25 + vector (kNN) search	Partial	Yes	BM25, Embedding, SQL
FAISS	Vector Index Library	In-memory vector similarity search library (Meta/Facebook) with JSON file for metadata	Yes	No	Embedding
FalkorDB	Graph Database	OpenCypher graph database with ANN vector search	Yes (SSPL)	No	Embedding, Cypher
MongoDB Atlas	Document / NoSQL Database	Cloud document database with Atlas Vector Search and full-text search	No	Yes	Embedding, Full-text
Oracle	Relational Database	Oracle with native AI Vector Search, HNSW vector index and DBMS_SEARCH full-text keyword index	No	Yes	Embedding, Keyword
OpenSearch	Search Engine	Distributed search engine (AWS fork of Elasticsearch) with BM25 + kNN vector search	Yes	Yes	BM25, Embedding, Hybrid, Metadata, SQL
PGVector	Relational Database	Relational database (PostgreSQL) with the `pgvector` extension for vector similarity search	Yes	Yes	Embedding, Keyword
Pinecone	Vector Database	Managed cloud vector database	No	Yes	Embedding
Qdrant	Vector Database	Purpose-built vector database with dense + sparse embedding support	Yes	Yes	Embedding, Sparse Embedding, Hybrid
Supabase	Relational Database	Managed cloud Supabase — a wrapper over PgvectorDocumentStore with Supabase-specific defaults	Yes	Yes	Embedding, Keyword
Valkey	In-memory Key-Value Store	In-memory key-value store (Redis fork) with HNSW vector search via `glide` client	Yes	Yes	Embedding
Vespa	Search Engine	Distributed search & serving engine with BM25 lexical + HNSW vector (ANN) search	Yes	No	BM25, Embedding
Weaviate	Vector Database	Purpose-built vector database with hybrid search support	Yes	Yes	BM25, Embedding, Hybrid

External integrations

Integration	Category	Engine Type	Open Source	Async Support	Retrievers
Couchbase	Document / NoSQL Database	Distributed NoSQL document database with vector search via Search Service	Partial	Yes	Embedding, Full-text
LanceDB	Vector Database	Embedded vector database built on the Lance columnar format, optimized for multimodal data	Yes	Yes	Embedding, Full-text, Hybrid
Milvus	Vector Database	Open-source vector database built for scalable similarity search	Yes	No	Embedding
Needle	Search Engine	Managed RAG-as-a-service platform with built-in document storage and vector search	No	Yes	Embedding, Sparse Embedding, Hybrid
Neo4j	Multi-model Database	Graph database with native vector index support for combined graph traversal and similarity search	Partial	No	Embedding
SingleStore	Relational Database	Distributed SQL database with native vector search and full-text search support	No	Yes	Embedding, Full-text, Keyword

Vector Databases

Purpose-built for vector and embedding search
Advanced indexing techniques for efficient similarity search
Designed for high scalability and availability with large volumes of high-dimensional data
Most support metadata filtering alongside vector search
Increasingly adding hybrid (vector + keyword) search support
Mostly open source, widely available as managed cloud services

Best for semantic search over large document corpora — e.g. a knowledge base where users search by meaning rather than exact keywords.

Chroma
Pinecone
Qdrant
Weaviate
LanceDB (external integration)
Milvus (external integration)

Search Engines

Originally built for full-text (BM25) search, with vector (kNN) capabilities added later
Excellent support for text data, tokenisation, and language-aware querying
Scale both horizontally and vertically in production environments
Strong foundation for hybrid search combining keyword and semantic retrieval
Battle-tested in enterprise environments with mature tooling and observability

Best for enterprise search or log analytics where both full-text (BM25) and vector search are needed — e.g. an e-commerce product search with filters.

Azure AI Search (AzureAISearchDocumentStore)
Elasticsearch
OpenSearch
Needle (external integration)
Vespa

Relational Databases

Standard SQL databases extended with vector search via plugins or extensions
Vectors live alongside relational data, enabling combined vector + SQL queries in a single store
Lower operational overhead when PostgreSQL is already part of the stack
Vector search performance is lower than purpose-built databases, but sufficient for many use cases
Familiar tooling, transactions, and data integrity guarantees of a relational database

Best for use cases where documents live alongside structured relational data — e.g. a product catalogue where vector search and SQL JOINs are both needed.

AlloyDB
Oracle
PGVector
Supabase
SingleStore (external integration)

Document / NoSQL Databases

General-purpose document stores with vector search added on top
Flexible, schema-less data model suited for heterogeneous document collections
Horizontal scaling and high availability inherited from the underlying NoSQL engine
Good choice when the database is already in use and adding a separate vector store is undesirable
Vector search performance may trail behind purpose-built databases

Best for applications already using MongoDB or Cassandra that want to add RAG capabilities without introducing a new infrastructure component.

Astra (AstraDocumentStore)
MongoDB Atlas
Couchbase (external integration)

In-memory Key-Value Stores

In-memory architecture delivers extremely low read/write latency
Vector search (HNSW) layered on top of an existing caching infrastructure
Ideal when the stack already includes Redis or Valkey as a cache or session store
Data is ephemeral by default; persistence requires explicit configuration
Less suited for large corpora where memory cost becomes significant

Best for low-latency, real-time retrieval — e.g. a chatbot that needs sub-millisecond response times.

Valkey

Vector Index Libraries

Low-level, in-process vector similarity search — not a full database
No network overhead; runs entirely within the application process
Very efficient use of hardware resources (CPU/GPU)
Limited to vectors only; metadata must be managed separately (e.g. via a JSON file)
No built-in persistence, replication, or multi-client access

Best for local prototyping, research, or small-scale applications where a lightweight in-process solution is preferred over running an external database server.

FAISS

Multi-model Databases

Single engine supporting multiple data models: graph, document, key-value, and vector
Eliminates the need to maintain separate databases for different data representations
Suited for knowledge graphs or applications with complex entity relationships
Vector search (HNSW) available alongside graph traversal and document queries
Smaller community and ecosystem compared to more established categories

Best for applications requiring multiple data models in a single engine — e.g. a knowledge graph where entities are connected by relationships and also need vector similarity search.

ArcadeDB
FalkorDB
Neo4j (external integration)

The In-memory Document Store

Haystack ships with an ephemeral document store that relies on pure Python data structures stored in memory, so it doesn't fall into any of the vector database categories above. This special Document Store is ideal for creating quick prototypes with small datasets. It doesn't require any special setup, and it can be used right away without installing additional dependencies.

InMemoryDocumentStore

Final Considerations

It can be very challenging to pick one vector database over another by only looking at pure performance, as even the slightest difference in the benchmark can produce a different leaderboard (for example, some benchmarks test the cloud services while others work on a reference machine). Thinking about including features like filtering or not can bring in a whole new set of complexities that make the comparison even harder.

What's important for you to know is that the Document Store interface doesn't add much to the costs, and the relative performance of one vector database over another should stay the same when used within Haystack pipelines.

DocumentStore Integrations Available in Haystack​

Core integrations​

External integrations​

Vector Databases​

Search Engines​

Relational Databases​

Document / NoSQL Databases​

In-memory Key-Value Stores​

Vector Index Libraries​

Multi-model Databases​

The In-memory Document Store​

Final Considerations​