Skip to main content
Version: 2.30

SupabasePgvectorKeywordRetriever

A keyword-based Retriever that fetches documents matching a query from the SupabasePgvectorDocumentStore.

Most common position in a pipeline1. Before a PromptBuilder in a RAG pipeline 2. The last component in the semantic search pipeline 3. Before an ExtractiveReader in an extractive QA pipeline
Mandatory init variablesdocument_store: An instance of a SupabasePgvectorDocumentStore
Mandatory run variablesquery: A string
Output variablesdocuments: A list of documents (matching the query)
API referenceSupabase
GitHub linkhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/supabase
Package namesupabase-haystack

Overview

SupabasePgvectorKeywordRetriever is a thin wrapper around PgvectorKeywordRetriever, adapted for use with SupabasePgvectorDocumentStore.

It uses PostgreSQL full-text search (to_tsvector / plainto_tsquery) to find Documents and ranks them with the ts_rank_cd function. The ranking considers how often the query terms appear in the Document, how close together the terms are, and how important the part of the Document is where they occur. For more details, see the PostgreSQL documentation.

Keep in mind that, unlike similar components such as ElasticsearchBM25Retriever, this Retriever does not apply fuzzy search out of the box, so it's necessary to carefully formulate the query in order to avoid getting zero results.

The language used to parse query and Document content for keyword retrieval is set via the language parameter on the SupabasePgvectorDocumentStore (defaults to "english").

In addition to the query, the Retriever accepts optional parameters including top_k (the maximum number of Documents to retrieve) and filters to narrow the search space.

Installation

shell
pip install supabase-haystack

Usage

On its own

This Retriever needs the SupabasePgvectorDocumentStore and indexed Documents to run.

Set the SUPABASE_DB_URL environment variable with your Supabase database connection string.

python
from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
from haystack_integrations.components.retrievers.supabase import (
SupabasePgvectorKeywordRetriever,
)

document_store = SupabasePgvectorDocumentStore()
retriever = SupabasePgvectorKeywordRetriever(document_store=document_store)

retriever.run(query="my nice query")

In a RAG pipeline

The prerequisites for running this code are:

  • Set an environment variable OPENAI_API_KEY with your OpenAI API key.
  • Set an environment variable SUPABASE_DB_URL with the connection string to your Supabase database.
python
from haystack import Document, Pipeline
from haystack.components.builders.answer_builder import AnswerBuilder
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.document_stores.types import DuplicatePolicy

from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
from haystack_integrations.components.retrievers.supabase import (
SupabasePgvectorKeywordRetriever,
)

prompt_template = [
ChatMessage.from_user(
"Given these documents, answer the question.\nDocuments:\n"
"{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
"Question: {{question}}\nAnswer:",
),
]

document_store = SupabasePgvectorDocumentStore(
language="english",
recreate_table=True,
)

documents = [
Document(content="There are over 7,000 languages spoken around the world today."),
Document(
content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
),
Document(
content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
),
]

document_store.write_documents(documents=documents, policy=DuplicatePolicy.SKIP)

retriever = SupabasePgvectorKeywordRetriever(document_store=document_store)
rag_pipeline = Pipeline()
rag_pipeline.add_component(name="retriever", instance=retriever)
rag_pipeline.add_component(
instance=ChatPromptBuilder(
template=prompt_template,
required_variables={"question", "documents"},
),
name="prompt_builder",
)
rag_pipeline.add_component(instance=OpenAIChatGenerator(), name="llm")
rag_pipeline.add_component(instance=AnswerBuilder(), name="answer_builder")
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
rag_pipeline.connect("llm.replies", "answer_builder.replies")
rag_pipeline.connect("retriever", "answer_builder.documents")

question = "languages spoken around the world today"
result = rag_pipeline.run(
{
"retriever": {"query": question},
"prompt_builder": {"question": question},
"answer_builder": {"query": question},
},
)
print(result["answer_builder"])