chat-ui Feature Request: Add GPT-4V to multi-modal models support

Hi there,

The llama-cpp-python recently added the following multi-modal models support: llava-v1.5-7b llava-v1.5-13b bakllava-1-7b

Please see the following URL for more information: https://llama-cpp-python.readthedocs.io/en/latest/server/#multimodal-models

All the best!

Dec 10 '23 09:12 limcheekin

Looks like the API allows you to upload base64 encoded images directly in the request. This should work well with our current workflow for TGI so implementation should be straightforward. I'll give it a shot!

Dec 10 '23 20:12 nsarrazin

Appreciate if you could support image_url type of content like the following code:

response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png",
                    },
                },
                {"type": "text", "text": "What does the image say. Format your response as a json object with a single 'text' key."},
            ],
        }
    ],
    response_format={ "type": "json_object" }
)

Thanks.

Dec 11 '23 03:12 limcheekin

Hi there,

is there any progress?

I hope this feature will be added to chat-ui :)

Apr 11 '24 05:04 eagle705