Feature Request: Add GPT-4V to multi-modal models support
Hi there,
The llama-cpp-python recently added the following multi-modal models support:
llava-v1.5-7b
llava-v1.5-13b
bakllava-1-7b
Please see the following URL for more information: https://llama-cpp-python.readthedocs.io/en/latest/server/#multimodal-models
All the best!
Looks like the API allows you to upload base64 encoded images directly in the request. This should work well with our current workflow for TGI so implementation should be straightforward. I'll give it a shot!
Appreciate if you could support image_url type of content like the following code:
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png",
},
},
{"type": "text", "text": "What does the image say. Format your response as a json object with a single 'text' key."},
],
}
],
response_format={ "type": "json_object" }
)
Thanks.
Hi there,
is there any progress?
I hope this feature will be added to chat-ui :)