llama-stack Enable ingestion of precomputed embeddings

Enable ingestion of precomputed embeddings

Open franciscojavierarceo opened this issue 5 months ago • 0 comments

🚀 Describe the new functionality needed

Users may want to precompute embeddings outside of the Stack using a variety of other tools (e.g., https://github.com/meta-llama/llama-stack/pull/1563, https://github.com/meta-llama/llama-stack/pull/1866, https://github.com/meta-llama/llama-stack/pull/1290).

Allowing users to pass in precomputed embeddings along with their Chunk provides users with this functionality without changing core APIs.

💡 Why is this needed? What if we don't build it?

Without this, users can't ingest data that is embedded outside of the Stack. This forces a significant amount of unnecessary compute to the inference provider when alternative frameworks can be used to do so (e.g., batch processing embeddings for large enterprise data).

Other thoughts

No response

May 30 '25 01:05 franciscojavierarceo

llama-stack llama-stack copied to clipboard

Enable ingestion of precomputed embeddings

🚀 Describe the new functionality needed

💡 Why is this needed? What if we don't build it?

Other thoughts

llama-stack
llama-stack copied to clipboard