llama-stack
llama-stack copied to clipboard
feat: Enable ingestion of precomputed embeddings
What does this PR do?
Enable ingestion of precomputed embeddings with Chunks.
This PR enhances the Llama Stack vector database APIs, schemas, and documentation to allow users to supply precomputed embedding vectors when inserting chunks. If a chunk includes an embedding, it is used directly; if not, embeddings are computed as before.
Summary of changes:
docs/source/building_applications/rag.md- Added a section and example showing how to insert a
Chunkwith precomputed embeddings.
- Added a section and example showing how to insert a
llama_stack/apis/vector_io/vector_io.py- Updated the Chunk model to support an optional embedding field and enhanced docstrings for clarity.
llama_stack/providers/utils/memory/vector_store.py- Modified the
insert_chunkslogic to use provided embeddings if available; computes them only for chunks missing embeddings.
- Modified the
llama_stack/providers/inline/tool_runtime/rag/memory.py- Updated chunk metadata token handling to use
.get()with a default value.
- Updated chunk metadata token handling to use
tests/integration/vector_io/test_vector_io.py- Added an integration test for inserting and retrieving chunks with precomputed embeddings.
tests/unit/rag/test_vector_store.py- Added unit tests for the
Chunkmodel and for inserting chunks with and without precomputed embeddings in the vector store.
- Added unit tests for the
docs/_static/llama-stack-spec.html, docs/_static/llama-stack-spec.yaml- Extended the Chunk schema to include an optional embedding field and detailed descriptions for content, metadata, and embedding.
Test Plan
Added unit and integration tests.
I also tested this manually with a script to confirm the behavior.