feast icon indicating copy to clipboard operation
feast copied to clipboard

Support Hugging Face get_top_docs() Style Retrieval for RAG in retrieve_online_documents_v2

Open ntkathole opened this issue 7 months ago • 1 comments

Is your feature request related to a problem? Please describe.

Extend retrieve_online_documents_v2() implementation to support integration with Hugging Face’s Transformers-based RagRetriever, which expects a specific method signature and output format via a get_top_docs() method.

https://huggingface.co/docs/transformers/model_doc/rag#transformers.RagRetriever

Add a new method (e.g., get_top_docs) to the Feast or provide a utility class like FeastIndex that wraps retrieve_online_documents_v2 and returns RAG-compatible results.

get_top_docs(query_vectors, n_docs) is expected to:

Input: query_vectors: A tensor or list of vectors representing one or more queries (usually from a language model like BERT). n_docs: The number of top documents (e.g., text passages) to retrieve for each query.

Expected Output:

A tuple of:

doc_scores: List[List[float]] # similarity scores doc_ids: List[List[str]] # string document IDs docs: List[List[str]] # raw document text

Describe the solution you'd like

index = FeastIndex(
    vector_store=feast_retriever,
    config=config,
    table=feature_view,
    requested_features=["metadata", "source"],
    text_field="document"
)
scores, ids, texts = index.get_top_docs(query_vectors=[embedding], n_docs=5)

ntkathole avatar May 27 '25 04:05 ntkathole

Probably we should just move retrieve_online_documents_v2 to retrieve_online_documents as well.

franciscojavierarceo avatar May 29 '25 13:05 franciscojavierarceo