ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: Multimodel embedding retrieve for image in docs

Open dejianchen-x opened this issue 1 week ago • 0 comments

Describe your problem

Is there any support for image retrieve in doc/xls etc. ?

  1. extract images in xls automatically

  2. encode image as embedding, and retrieve using multimodal embedding model, such as LLM2CLIP(https://arxiv.org/abs/2411.04997) or GME(https://arxiv.org/abs/2412.16855).

Currently, I found the image is processed using img2text API,extract image description, and retrieve like text; NOT really text-image retrieve via multimodal embedding models.

dejianchen-x avatar Feb 17 '25 09:02 dejianchen-x