chat-ui Generic Multimodal Support

Adds support for multimodal with Anthropic by increasing the maximum file size, adjusting the message.files type to support mime and removing the assumptions around TGI.

Changed from base64 or hash string[] to { type: 'hash' | 'base64', value: string, mime: string }
Moved TGI specific image resizing and markdown ![](base64) prompting to TGI endpoint code
Changed maximum file size from 2mb -> 10mb
- Thinking of reducing this and adding back the in-browser image resizing
- Likely should be configurable

I'd like to move the file upload logic out of the UI code and begin uploading immediately upon selecting a file, but that's outside the scope of this PR. However, that should allow for processing files earlier, which could be particularly useful for non-images (i.e. making embeddings for PDFs).

[x] Test the TGI endpoints
[x] Ensure clients receive a useful error message when their files are incompatible (with respect to mime types)

Apr 16 '24 22:04 saghen

This is amazing. When this is merged, please ping me. I would like to adapt it for OpenAI + Gemini 1.5 Pro. ✌️

Apr 21 '24 11:04 flexchar

This is amazing! Would be nice to extend this to openai api as well if possible.

Apr 26 '24 21:04 Ichigo3766

Yes amazing! It would be so great to have also OpenAI-like API compatibility, so many Open sources multimodal models are available like Idefics2, Llava, llama-3-vision, ... :)

May 01 '24 06:05 Extremys

Hey @Saghen, PR looking great from my local testing!

We changed a few things last week since we switched our docker image to a new build process. That probably introduced some conflicts but I don't mind fixing them for you since I created them 😅 If you're ok giving me write-access on the PR then I can just do the merge commit directly.

May 05 '24 22:05 nsarrazin

@nsarrazin that'd be great, thanks! granted you permission

May 05 '24 23:05 saghen

And thanks for exposing the mime type in files 🔥 that's gonna be super handy down the road as we support more modalities

May 07 '24 20:05 nsarrazin

@Ichigo3766 @Extremys @flexchar heads up that it was trivial so I added support for OpenAI in this PR as well

May 13 '24 02:05 saghen

@Saghen I will review it soon. Could you merge/rebase with the main so that the merge conflicts are gone ❤️

May 14 '24 09:05 mishig25

@flexchar are you still planning on adding support for Gemini pro ? cc @ArthurGoupil

Jul 10 '24 09:07 pocman

@flexchar are you still planning on adding support for Gemini pro ? cc @ArthurGoupil

@flexchar I would be happy to help if needed!

Jul 10 '24 09:07 ArthurGoupil

related https://github.com/huggingface/chat-ui/pull/1330

Jul 10 '24 11:07 mishig25

Hi Arthur, unfortunately I will not be able to. It was for my personal "chatgpt" local alternative. and I have since discovered the Open Web-UI, which I am running locally in the docker and it provides me with much more.

Worth a note, I've been prototyping with vercel/ai projects and I think hugging face could totally consider using their providers. It is a very beautiful abstraction layer. Alternatively, using native google library is just as reasonable.

Hope that's alright! Maybe will also allow a sooner merge thus not leaving PR very stale. ✌️

Jul 10 '24 18:07 flexchar

chat-ui chat-ui copied to clipboard

Generic Multimodal Support

chat-ui
chat-ui copied to clipboard