Version: 2.30

SupabasePgvectorEmbeddingRetriever

An embedding-based Retriever compatible with the SupabasePgvectorDocumentStore.


Most common position in a pipeline	1. After a Text Embedder and before a `PromptBuilder` in a RAG pipeline 2. The last component in the semantic search pipeline 3. After a Text Embedder and before an `ExtractiveReader` in an extractive QA pipeline
Mandatory init variables	`document_store`: An instance of a SupabasePgvectorDocumentStore
Mandatory run variables	`query_embedding`: A vector representing the query (a list of floats)
Output variables	`documents`: A list of documents
API reference	Supabase
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/supabase
Package name	`supabase-haystack`

Overview

SupabasePgvectorEmbeddingRetriever is a thin wrapper around PgvectorEmbeddingRetriever, adapted for use with SupabasePgvectorDocumentStore. It compares the query and Document embeddings and fetches the Documents most relevant to the query based on vector similarity.

When using this Retriever in your pipeline, make sure embeddings are available. Add a Document Embedder to your indexing pipeline and a Text Embedder to your query pipeline.

In addition to query_embedding, the Retriever accepts optional parameters including top_k (the maximum number of Documents to retrieve), filters to narrow down the search space, and vector_function to override the similarity function set on the Document Store.

Some relevant parameters that impact embedding retrieval must be defined when the SupabasePgvectorDocumentStore is initialized: embedding_dimension, vector_function, and search_strategy ("exact_nearest_neighbor" or "hnsw").

Installation

shell

pip install supabase-haystack

Usage

On its own

This Retriever needs the SupabasePgvectorDocumentStore and indexed Documents to run.

Set the SUPABASE_DB_URL environment variable with your Supabase database connection string.

python

from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
from haystack_integrations.components.retrievers.supabase import (
    SupabasePgvectorEmbeddingRetriever,
)

document_store = SupabasePgvectorDocumentStore(embedding_dimension=768)
retriever = SupabasePgvectorEmbeddingRetriever(document_store=document_store)

# using a fake vector to keep the example simple
retriever.run(query_embedding=[0.1] * 768)

In a Pipeline

python

from haystack import Document, Pipeline
from haystack.document_stores.types import DuplicatePolicy
from haystack.components.embedders import (
    SentenceTransformersTextEmbedder,
    SentenceTransformersDocumentEmbedder,
)

from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
from haystack_integrations.components.retrievers.supabase import (
    SupabasePgvectorEmbeddingRetriever,
)

document_store = SupabasePgvectorDocumentStore(
    embedding_dimension=768,
    vector_function="cosine_similarity",
    recreate_table=True,
)

documents = [
    Document(content="There are over 7,000 languages spoken around the world today."),
    Document(
        content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
    ),
    Document(
        content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
    ),
]

document_embedder = SentenceTransformersDocumentEmbedder()
documents_with_embeddings = document_embedder.run(documents)

document_store.write_documents(
    documents_with_embeddings.get("documents"),
    policy=DuplicatePolicy.OVERWRITE,
)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component(
    "retriever",
    SupabasePgvectorEmbeddingRetriever(document_store=document_store),
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "How many languages are there?"

result = query_pipeline.run({"text_embedder": {"text": query}})

print(result["retriever"]["documents"][0])

Overview​

Installation​

Usage​

On its own​

In a Pipeline​

Overview

Installation

Usage

On its own

In a Pipeline