camel icon indicating copy to clipboard operation
camel copied to clipboard

[Feature Request] Multi-modal RAG(Retrieval-Augmented Generation)

Open FUYICC opened this issue 1 year ago • 0 comments

Required prerequisites

  • [X] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
  • [X] Consider asking first in a Discussion.

Motivation

The current RAG can only retrieve text-type information, and cannot retrieve and extract information from images, audio, and other information.

Solution

For images:

  • [ ] Integrate VLM matching embedding model (CLIP)
  • [ ] RAG function with image retrieval

For other modalities: tbc

FUYICC avatar Jan 30 '24 08:01 FUYICC