Multimodal in kubeagi

Open bjwswang opened this issue 2 years ago • 0 comments

Instead of focusing on a single data modality, such as text or images,,multimodal approaches consider and analyze data from various modalities simultaneously, including text, images, audio, video, sensor data, etc.

For now,we only have single data modality text which is not good enough. There is a significant trending from user to mulimodal , so we should make this a import Story.

However we should integrate multimodal capabilities step by step.

step1: text (already supported) step2: images (v0.3) step3: videos (v0.4 or v0.5)

Use cases

1. Chat with images/videos

When user uploads a image/video, our gpt can recognize the content in this image/video. And provide a basic summary to user which is quite similar to our chat_with_docs.

Subtasks for this story

[ ] #626
[ ] #607
[x] #606
[ ] #627

Jan 23 '24 02:01 bjwswang