FastChat
FastChat copied to clipboard
GPT-4-Vision and Gemini Vision multimodal model support?
I want to add vision chat battle + direct vision chat support. GPT-4 Vision and Gemini Vision are multimodal models. along add other multimodal models.
I'd also like to see this feature implemented! Adding vision chat battle and direct vision chat support with cutting-edge multimodal models would be incredibly exciting.
Maybe it also would be possible to implement LLaVA-NEXT, MiniCPM-V, CogVLM Chat, QwenVL, InstructBLIP Vicuna 7b, and UForm-Gen2? These powerful models would enable fascinating conversations, collaborations, and insights. The WildVision vision-arena (https://huggingface.co/spaces/WildVision/vision-arena) showcases how this could be implemented.
I'm also curious what hurdles are there to implement this so far.
+1