[Bug]: Different Behavior with Image Input on GROQ/Llama 3.2 Vision Model vs Qwen

Open NEWbie0709 opened this issue 1 year ago • 2 comments

What happened?

I've encountered an issue while using LiteLLM with the GROQ/Llama 3.2 Vision model and Qwen. The problem arises specifically when providing an image input.

--GROQ/Llama 3.2 Vision Model: Works as expected with image inputs. --Qwen-vl-max-latest: Produces an error when processing the same image input.

code using

import requests
import json

url = 'http://localhost:4000/chat/completions'

headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer sk-1234'
}

data = {
    "model": "model",  # your model_name
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                    }
                }
            ]
        }
    ]
}

print(json.dumps(data))
response = requests.post(url, headers=headers, data=json.dumps(data))

print(response.status_code)
print(response.json())

Output from groq WhatsApp Image 2024-11-25 at 11 05 09_7113de61

Output from qwen

Relevant log output

No response

Twitter / LinkedIn details

No response

Nov 26 '24 08:11 NEWbie0709