llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

feat: Enable ingestion of precomputed embeddings

Open franciscojavierarceo opened this issue 4 months ago • 1 comments

What does this PR do?

Enable ingestion of precomputed embeddings with Chunks.

This PR enhances the Llama Stack vector database APIs, schemas, and documentation to allow users to supply precomputed embedding vectors when inserting chunks. If a chunk includes an embedding, it is used directly; if not, embeddings are computed as before.

Summary of changes:

  • docs/source/building_applications/rag.md
    • Added a section and example showing how to insert a Chunk with precomputed embeddings.
  • llama_stack/apis/vector_io/vector_io.py
    • Updated the Chunk model to support an optional embedding field and enhanced docstrings for clarity.
  • llama_stack/providers/utils/memory/vector_store.py
    • Modified the insert_chunks logic to use provided embeddings if available; computes them only for chunks missing embeddings.
  • llama_stack/providers/inline/tool_runtime/rag/memory.py
    • Updated chunk metadata token handling to use .get() with a default value.
  • tests/integration/vector_io/test_vector_io.py
    • Added an integration test for inserting and retrieving chunks with precomputed embeddings.
  • tests/unit/rag/test_vector_store.py
    • Added unit tests for the Chunk model and for inserting chunks with and without precomputed embeddings in the vector store.
  • docs/_static/llama-stack-spec.html, docs/_static/llama-stack-spec.yaml
    • Extended the Chunk schema to include an optional embedding field and detailed descriptions for content, metadata, and embedding.

Test Plan

Added unit and integration tests.

I also tested this manually with a script to confirm the behavior.

Resolves https://github.com/meta-llama/llama-stack/issues/2318

franciscojavierarceo avatar May 30 '25 00:05 franciscojavierarceo