llama-stack
llama-stack copied to clipboard
Support multimodal embedding generation
🚀 Describe the new functionality needed
Needed for supporting RAG on documents with images and searches using image
💡 Why is this needed? What if we don't build it?
RAG will only support text, which is not ideal.
Other thoughts
No response
@dineshyv /v1/inference/embeddings supports string, typed text, and typed image (url and base64) input. is that sufficient to resolve this issue?
https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/inference/inference.py#L433 -> https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/common/content_types.py#L75 -> https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/common/content_types.py#L66 -> https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/common/content_types.py#L42
Yes this is now possible with the pointers @mattf provided 🎉