llama-stack
llama-stack copied to clipboard
Enable ingestion of precomputed embeddings
🚀 Describe the new functionality needed
Users may want to precompute embeddings outside of the Stack using a variety of other tools (e.g., https://github.com/meta-llama/llama-stack/pull/1563, https://github.com/meta-llama/llama-stack/pull/1866, https://github.com/meta-llama/llama-stack/pull/1290).
Allowing users to pass in precomputed embeddings along with their Chunk provides users with this functionality without changing core APIs.
💡 Why is this needed? What if we don't build it?
Without this, users can't ingest data that is embedded outside of the Stack. This forces a significant amount of unnecessary compute to the inference provider when alternative frameworks can be used to do so (e.g., batch processing embeddings for large enterprise data).
Other thoughts
No response