chat-ui icon indicating copy to clipboard operation
chat-ui copied to clipboard

Generic Multimodal Support

Open saghen opened this issue 1 year ago • 8 comments

Adds support for multimodal with Anthropic by increasing the maximum file size, adjusting the message.files type to support mime and removing the assumptions around TGI.

  • Changed from base64 or hash string[] to { type: 'hash' | 'base64', value: string, mime: string }
  • Moved TGI specific image resizing and markdown ![](base64) prompting to TGI endpoint code
  • Changed maximum file size from 2mb -> 10mb
    • Thinking of reducing this and adding back the in-browser image resizing
    • Likely should be configurable

I'd like to move the file upload logic out of the UI code and begin uploading immediately upon selecting a file, but that's outside the scope of this PR. However, that should allow for processing files earlier, which could be particularly useful for non-images (i.e. making embeddings for PDFs).

  • [x] Test the TGI endpoints
  • [x] Ensure clients receive a useful error message when their files are incompatible (with respect to mime types)

image

saghen avatar Apr 16 '24 22:04 saghen

This is amazing. When this is merged, please ping me. I would like to adapt it for OpenAI + Gemini 1.5 Pro. ✌️

flexchar avatar Apr 21 '24 11:04 flexchar

This is amazing! Would be nice to extend this to openai api as well if possible.

Ichigo3766 avatar Apr 26 '24 21:04 Ichigo3766

Yes amazing! It would be so great to have also OpenAI-like API compatibility, so many Open sources multimodal models are available like Idefics2, Llava, llama-3-vision, ... :)

Extremys avatar May 01 '24 06:05 Extremys

Hey @Saghen, PR looking great from my local testing!

We changed a few things last week since we switched our docker image to a new build process. That probably introduced some conflicts but I don't mind fixing them for you since I created them 😅 If you're ok giving me write-access on the PR then I can just do the merge commit directly.

nsarrazin avatar May 05 '24 22:05 nsarrazin

@nsarrazin that'd be great, thanks! granted you permission

saghen avatar May 05 '24 23:05 saghen

And thanks for exposing the mime type in files 🔥 that's gonna be super handy down the road as we support more modalities

nsarrazin avatar May 07 '24 20:05 nsarrazin

@Ichigo3766 @Extremys @flexchar heads up that it was trivial so I added support for OpenAI in this PR as well

saghen avatar May 13 '24 02:05 saghen

@Saghen I will review it soon. Could you merge/rebase with the main so that the merge conflicts are gone ❤️

mishig25 avatar May 14 '24 09:05 mishig25

@flexchar are you still planning on adding support for Gemini pro ? cc @ArthurGoupil

pocman avatar Jul 10 '24 09:07 pocman

@flexchar are you still planning on adding support for Gemini pro ? cc @ArthurGoupil

@flexchar I would be happy to help if needed!

ArthurGoupil avatar Jul 10 '24 09:07 ArthurGoupil

related https://github.com/huggingface/chat-ui/pull/1330

mishig25 avatar Jul 10 '24 11:07 mishig25

Hi Arthur, unfortunately I will not be able to. It was for my personal "chatgpt" local alternative. and I have since discovered the Open Web-UI, which I am running locally in the docker and it provides me with much more.

Worth a note, I've been prototyping with vercel/ai projects and I think hugging face could totally consider using their providers. It is a very beautiful abstraction layer. Alternatively, using native google library is just as reasonable.

Hope that's alright! Maybe will also allow a sooner merge thus not leaving PR very stale. ✌️

flexchar avatar Jul 10 '24 18:07 flexchar