Generic Multimodal Support
Adds support for multimodal with Anthropic by increasing the maximum file size, adjusting the message.files type to support mime and removing the assumptions around TGI.
- Changed from base64 or hash
string[]to{ type: 'hash' | 'base64', value: string, mime: string } - Moved TGI specific image resizing and markdown
prompting to TGI endpoint code - Changed maximum file size from 2mb -> 10mb
- Thinking of reducing this and adding back the in-browser image resizing
- Likely should be configurable
I'd like to move the file upload logic out of the UI code and begin uploading immediately upon selecting a file, but that's outside the scope of this PR. However, that should allow for processing files earlier, which could be particularly useful for non-images (i.e. making embeddings for PDFs).
- [x] Test the TGI endpoints
- [x] Ensure clients receive a useful error message when their files are incompatible (with respect to mime types)
This is amazing. When this is merged, please ping me. I would like to adapt it for OpenAI + Gemini 1.5 Pro. ✌️
This is amazing! Would be nice to extend this to openai api as well if possible.
Yes amazing! It would be so great to have also OpenAI-like API compatibility, so many Open sources multimodal models are available like Idefics2, Llava, llama-3-vision, ... :)
Hey @Saghen, PR looking great from my local testing!
We changed a few things last week since we switched our docker image to a new build process. That probably introduced some conflicts but I don't mind fixing them for you since I created them 😅 If you're ok giving me write-access on the PR then I can just do the merge commit directly.
@nsarrazin that'd be great, thanks! granted you permission
And thanks for exposing the mime type in files 🔥 that's gonna be super handy down the road as we support more modalities
@Ichigo3766 @Extremys @flexchar heads up that it was trivial so I added support for OpenAI in this PR as well
@Saghen I will review it soon. Could you merge/rebase with the main so that the merge conflicts are gone ❤️
@flexchar are you still planning on adding support for Gemini pro ? cc @ArthurGoupil
@flexchar are you still planning on adding support for Gemini pro ? cc @ArthurGoupil
@flexchar I would be happy to help if needed!
related https://github.com/huggingface/chat-ui/pull/1330
Hi Arthur, unfortunately I will not be able to. It was for my personal "chatgpt" local alternative. and I have since discovered the Open Web-UI, which I am running locally in the docker and it provides me with much more.
Worth a note, I've been prototyping with vercel/ai projects and I think hugging face could totally consider using their providers. It is a very beautiful abstraction layer. Alternatively, using native google library is just as reasonable.
Hope that's alright! Maybe will also allow a sooner merge thus not leaving PR very stale. ✌️