llama-stack Support multimodal embedding generation

Support multimodal embedding generation

Open dineshyv opened this issue 11 months ago • 1 comments

🚀 Describe the new functionality needed

Needed for supporting RAG on documents with images and searches using image

💡 Why is this needed? What if we don't build it?

RAG will only support text, which is not ideal.

Other thoughts

No response

Dec 12 '24 21:12 dineshyv

@dineshyv /v1/inference/embeddings supports string, typed text, and typed image (url and base64) input. is that sufficient to resolve this issue?

https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/inference/inference.py#L433 -> https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/common/content_types.py#L75 -> https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/common/content_types.py#L66 -> https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/common/content_types.py#L42

Feb 03 '25 16:02 mattf

Yes this is now possible with the pointers @mattf provided 🎉

Mar 03 '25 22:03 ashwinb

llama-stack llama-stack copied to clipboard

Support multimodal embedding generation

🚀 Describe the new functionality needed

💡 Why is this needed? What if we don't build it?

Other thoughts

llama-stack
llama-stack copied to clipboard