llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Enable ingestion of precomputed embeddings

Open franciscojavierarceo opened this issue 5 months ago • 0 comments

🚀 Describe the new functionality needed

Users may want to precompute embeddings outside of the Stack using a variety of other tools (e.g., https://github.com/meta-llama/llama-stack/pull/1563, https://github.com/meta-llama/llama-stack/pull/1866, https://github.com/meta-llama/llama-stack/pull/1290).

Allowing users to pass in precomputed embeddings along with their Chunk provides users with this functionality without changing core APIs.

💡 Why is this needed? What if we don't build it?

Without this, users can't ingest data that is embedded outside of the Stack. This forces a significant amount of unnecessary compute to the inference provider when alternative frameworks can be used to do so (e.g., batch processing embeddings for large enterprise data).

Other thoughts

No response

franciscojavierarceo avatar May 30 '25 01:05 franciscojavierarceo