extended_openai_conversation icon indicating copy to clipboard operation
extended_openai_conversation copied to clipboard

Image upload for GPT-4V? (Feature request)

Open mkammes opened this issue 1 year ago • 7 comments

GPT Plus members can use the upload function for media files, such as images.

https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images

As an example, this would be great to have an automation to take a picture of your bar (or refrigerator) and provide recipe suggestions based on the ingredients....along with a custom prompt.

Here is an example: https://m.facebook.com/groups/HomeAssistant/permalink/3611503665787644/?mibextid=Nif5oz .

I've tried doing this via Python and pyscript with little success.

Thanks!

mkammes avatar Dec 09 '23 17:12 mkammes

Thanks for a suggestion.

I just read the post on facebook, and it is really interesting feature. There are several things I would have to check before I'm certain that it is possible.

Details

1. The content type should be changed

currently messages form like below.

[
    {'role': 'system', 'content': "..."},
    {'role': 'user', 'content': 'turn on bedroom light'},
    {'role': 'function', 'name': 'execute_services', 'content': '[True]'}
]

In the openai guide you referenced, the content of message should be changed from string to list.

[
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': 'What is in this image?'},
            {'type': 'image_url', 'image_url': 'https://...'}
        ]
    }
]

2. How to attach image_url

Since it's hard for the component(extended_openai_conversation) to attach "image_url" in user role, the only way, I can think of, is to provide a function that attaches "image_url" via function response.

I hope the format like below works, but there is no example that uses both function and image.

[
    {
        'role': 'function',
        'content': [
            {/* function response (don't know how this object will be formatted) */},
            {'type': 'image_url', 'image_url': 'https://...'}
        ]
    }
]

I will look into this when I have time and GPT Plus is resumed.

jekalmin avatar Dec 10 '23 15:12 jekalmin

Outstanding! I can confirm it works (as I've replicated it via my own python script via the OpenAI example).

I'm happy to create an API key for you to test with.

Thanks!

mkammes avatar Dec 12 '23 20:12 mkammes

Thank you! I will try by myself first, and then ask for help if needed!

jekalmin avatar Dec 14 '23 12:12 jekalmin

I just upgraded and tried gpt-4-vision-preview model. Unfortunately, it seems that this model doesn't support functions. I got an error like below

Code

functions = [
    {
        "name": "get_image",
        "description": "Get image",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The image url",
                }
            },
            "required": ["url"],
        },
    }
]


response = openai.ChatCompletion.create(
  model="gpt-4-vision-preview",
  functions=functions,
  function_call="auto",
  messages=[
    {
      "role": "user",
      "content": "What’s in an image?"
    }
  ],
  max_tokens=300,
)

print(response)

Request

DEBUG:openai:api_version=None data='{"model": "gpt-4-vision-preview", "functions": [{"name": "get_image", "description": "Get image", "parameters": {"type": "object", "properties": {"url": {"type": "string", "description": "The image url"}}, "required": ["url"]}}], "function_call": "auto", "messages": [{"role": "user", "content": "What\\u2019s in an image?"}], "max_tokens": 300}' message='Post details'

Response

openai.error.InvalidRequestError: 2 validation errors for Request
body -> function_call
  extra fields not permitted (type=value_error.extra)
body -> functions
  extra fields not permitted (type=value_error.extra)

Maybe I will add a service, so that you can hook it via functions

jekalmin avatar Dec 24 '23 14:12 jekalmin

I have added "query_image" service in https://github.com/jekalmin/extended_openai_conversation/pull/60.

You can try adding function like below

Function

- spec:
    name: get_refrigerator_items
    description: Get description of items in refrigerator
    parameters:
      type: object
      properties:
        url:
          type: string
          description: image url of refrigerator
          enum:
            - https://i.pinimg.com/originals/8b/cc/f1/8bccf14daf77ce887fc162934335cb21.jpg # needs to change
      required:
      - url
  function:
    type: composite
    sequence:
      - type: script
        sequence:
          - service: extended_openai_conversation.query_image
            data:
              prompt: What alcohol and brands do you see in this picture?
              images:
                - url: "{{url}}"
              max_tokens: 300
              config_entry: YOUR_CONFIG_ENTRY_KEY # needs to change
            response_variable: _function_result
        response_variable: image_result
      - type: template
        value_template: "{{image_result.choices[0].message.content}}"

Then ask "what's in refrigerator"

jekalmin avatar Dec 24 '23 17:12 jekalmin

Outstanding! Great work. I look forward to testing this out!

mkammes avatar Dec 24 '23 17:12 mkammes

Released this in 1.0.1-beta2

jekalmin avatar Jan 17 '24 14:01 jekalmin