camel
camel copied to clipboard

Published 20 hours ago •

Reame
Issues

[Feature Request] Multi-modal RAG(Retrieval-Augmented Generation)

Open FUYICC opened this issue 1 year ago • 0 comments

Required prerequisites

[X] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
[X] Consider asking first in a Discussion.

Motivation

The current RAG can only retrieve text-type information, and cannot retrieve and extract information from images, audio, and other information.

Solution

For images:

[ ] Integrate VLM matching embedding model (CLIP)
[ ] RAG function with image retrieval

For other modalities: tbc

Jan 30 '24 08:01 FUYICC