chat-ui
chat-ui copied to clipboard
Generic Multimodal Support
Adds support for multimodal with Anthropic by increasing the maximum file size, adjusting the message.files type to support mime and removing the assumptions around TGI.
- Changed from base64 or hash
string[]to{ type: 'hash' | 'base64', value: string, mime: string } - Moved TGI specific image resizing and markdown
prompting to TGI endpoint code - Changed maximum file size from 2mb -> 10mb
- Thinking of reducing this and adding back the in-browser image resizing
- Likely should be configurable
I'd like to move the file upload logic out of the UI code and begin uploading immediately upon selecting a file, but that's outside the scope of this PR. However, that should allow for processing files earlier, which could be particularly useful for non-images (i.e. making embeddings for PDFs).
- [x] Test the TGI endpoints
- [x] Ensure clients receive a useful error message when their files are incompatible (with respect to mime types)
This is amazing. When this is merged, please ping me. I would like to adapt it for OpenAI + Gemini 1.5 Pro. ✌️
This is amazing! Would be nice to extend this to openai api as well if possible.
Yes amazing! It would be so great to have also OpenAI-like API compatibility, so many Open sources multimodal models are available like Idefics2, Llava, llama-3-vision, ... :)
Hey @Saghen, PR looking great from my local testing!
We changed a few things last week since we switched our docker image to a new build process. That probably introduced some conflicts but I don't mind fixing them for you since I created them 😅 If you're ok giving me write-access on the PR then I can just do the merge commit directly.
@nsarrazin that'd be great, thanks! granted you permission
And thanks for exposing the mime type in files 🔥 that's gonna be super handy down the road as we support more modalities
@Ichigo3766 @Extremys @flexchar heads up that it was trivial so I added support for OpenAI in this PR as well
@Saghen I will review it soon. Could you merge/rebase with the main so that the merge conflicts are gone ❤️
@flexchar are you still planning on adding support for Gemini pro ? cc @ArthurGoupil
@flexchar are you still planning on adding support for Gemini pro ? cc @ArthurGoupil
@flexchar I would be happy to help if needed!
related https://github.com/huggingface/chat-ui/pull/1330
Hi Arthur, unfortunately I will not be able to. It was for my personal "chatgpt" local alternative. and I have since discovered the Open Web-UI, which I am running locally in the docker and it provides me with much more.
Worth a note, I've been prototyping with vercel/ai projects and I think hugging face could totally consider using their providers. It is a very beautiful abstraction layer. Alternatively, using native google library is just as reasonable.
Hope that's alright! Maybe will also allow a sooner merge thus not leaving PR very stale. ✌️