FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

GPT-4-Vision and Gemini Vision multimodal model support?

Open youself64github opened this issue 1 year ago • 2 comments

I want to add vision chat battle + direct vision chat support. GPT-4 Vision and Gemini Vision are multimodal models. along add other multimodal models.

youself64github avatar Jan 03 '24 19:01 youself64github

I'd also like to see this feature implemented! Adding vision chat battle and direct vision chat support with cutting-edge multimodal models would be incredibly exciting.

Maybe it also would be possible to implement LLaVA-NEXT, MiniCPM-V, CogVLM Chat, QwenVL, InstructBLIP Vicuna 7b, and UForm-Gen2? These powerful models would enable fascinating conversations, collaborations, and insights. The WildVision vision-arena (https://huggingface.co/spaces/WildVision/vision-arena) showcases how this could be implemented.

I'm also curious what hurdles are there to implement this so far.

maninthemiddle01 avatar Mar 14 '24 21:03 maninthemiddle01

+1

dirtycomputer avatar Apr 29 '24 08:04 dirtycomputer