Issue with Sending Image URL to GPT-4o in Unity

Open Zaf01 opened this issue 1 year ago • 0 comments

Hi,

I am trying to implement the gpt 4o vision capabilities in Unity using this package. I am trying to send an image URL to the model in the following manner :

 public async void SendImageUrlToGPT4(string imageurl)
    {
        var userMessage = new ChatMessage
        {
            Role = "user",
            Content = "[{\"type\": \"text\", \"text\": \"What do you see in this image? Limit yourself to 15 words.\"}, {\"type\": \"image_url\", \"url\": \"" + imageurl + "\"}]"
        };


        messages.Add(userMessage);

        var request = new CreateChatCompletionRequest
        {
            Messages = messages,
            Model = "gpt-4o",
            MaxTokens = 300
        };

        var response = await openAI.CreateChatCompletion(request);


        if (response.Choices != null && response.Choices.Count > 0)
        {
            var chatResponse = response.Choices[0].Message;
          
            Debug.Log(chatResponse.Content);

            OnResponse.Invoke(chatResponse.Content);

            Debug.Log("Response Finished");
        }
        else
        {
            Debug.LogError("No response from GPT-4 Vision.");
        }
    }

However, the model always gives a response with incorrect descriptions which perhaps could be because there is some issue with the way the request is being sent to the model in Unity?

When I tried passing the same URL in the python code snippet provided by OpenAI, the model describes the image accurately. Here is the python code that I tested:


import openai
import json

# Set your API key
openai.api_key = ""

response = openai.ChatCompletion.create(
    model="gpt-4o",
   messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129",
          },
        },
      ],
    }
  ],
    max_tokens=300,
)

print(response['choices'][0]['message']['content'])

Here is the JSON dump of the request payload in Unity:

{"Role":"user","Content":"[{\"type\": \"text\", \"text\": \"What do you see in this image? Limit yourself to 15 words.\"}, {\"type\": \"image_url\", \"url\": \"https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129\"}]"}

The JSON dump in Python:

{
      "model": "gpt-4o",
      "messages": [
            {
                  "role": "user",
                  "content": [
                        {
                              "type": "text",
                              "text": "What\u2019s in this image?"
                        },
                        {
                              "type": "image_url",
                              "image_url": {
                                    "url": "https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129"
                              }
                        }
                  ]
            }
      ],
      "max_tokens": 300
}

Can you please let me know how can I correctly send the image to the model and get the correct response with the image description using this package? I am not sure what is causing this issue. Any insights on this would be greatly appreciated.

Jul 22 '24 06:07 Zaf01