Support for image embeddings in auto-embeddings mode

Open nepridumalnik opened this issue 3 months ago • 0 comments

Proposal:

Description:

Currently, auto-embeddings in ManticoreSearch only works with text models from HuggingFace. There is no way to use it for image-to-image or text-to-image search.

Proposal:

Extend auto-embeddings to support image embeddings.
Allow CLIP-based models (OpenAI CLIP, OpenCLIP, TinyCLIP) to generate embeddings for both text and images.
Enable multimodal search scenarios:

search by image → find visually similar images
search by text → find matching images

Use case:

A user has a database of product images (e.g., clothing from marketplaces). They want to:

Search for "gray coat" using a text query.
Upload an image of a coat and find visually similar coats.

Checklist:

^{To be completed by the assignee. Check off tasks that have been completed or are not applicable.}

[ ] Implementation completed
[ ] Tests developed
[ ] Documentation updated
[ ] Documentation reviewed
[x] OpenAPI YAML updated and issue created to rebuild clients

Sep 22 '25 15:09 nepridumalnik