chat-ui icon indicating copy to clipboard operation
chat-ui copied to clipboard

Feature Request: Add GPT-4V to multi-modal models support

Open limcheekin opened this issue 2 years ago • 3 comments

Hi there,

The llama-cpp-python recently added the following multi-modal models support: llava-v1.5-7b llava-v1.5-13b bakllava-1-7b

Please see the following URL for more information: https://llama-cpp-python.readthedocs.io/en/latest/server/#multimodal-models

All the best!

limcheekin avatar Dec 10 '23 09:12 limcheekin

Looks like the API allows you to upload base64 encoded images directly in the request. This should work well with our current workflow for TGI so implementation should be straightforward. I'll give it a shot!

nsarrazin avatar Dec 10 '23 20:12 nsarrazin

Appreciate if you could support image_url type of content like the following code:

response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png",
                    },
                },
                {"type": "text", "text": "What does the image say. Format your response as a json object with a single 'text' key."},
            ],
        }
    ],
    response_format={ "type": "json_object" }
)

Thanks.

limcheekin avatar Dec 11 '23 03:12 limcheekin

Hi there,

is there any progress?

I hope this feature will be added to chat-ui :)

eagle705 avatar Apr 11 '24 05:04 eagle705