ollama-python icon indicating copy to clipboard operation
ollama-python copied to clipboard

How to send image to vision model

Open engdante opened this issue 1 year ago • 1 comments

Please tell me the correct way to send an image to the vision model.

this is my function:

def generate_image_description(image_path): prompt = f"Describe the content of this image: {image_path}." response = client.chat(model='llava-phi3:3.8b', messages=[ { 'role': 'user', 'content': prompt, }, ]) return response['message']['content']

engdante avatar Sep 17 '24 13:09 engdante

Please refer to the definition of a "chat message" in the python code Message Type Dict.

The image can be passed in using the "images" key in your message dictionary. The "images" key is a sequence of "bytes" or "path-like str".

Here is an example:

import ollama

response = ollama.chat(
    model="moondream",
    messages=[
        {"role": "user", "content": "Describe the image", "images": ["./cat.jpeg"]}
    ],
)

print(response["message"]['role'])
print(response["message"]['content'])

karanravindra avatar Nov 05 '24 20:11 karanravindra

Just my grain of salt for future searches: for gemma3, this is what worked for me using the Python ollama lib:

for img_path in paths:
  try:
    # Read the image file as binary data
    with open(img_path, 'rb') as img_file:
        img_data = img_file.read()
    
    # Convert image to base64 for Ollama
    img_base64 = base64.b64encode(img_data).decode('utf-8')
    
    # Use the correct approach as per Gemma 3 documentation - images at top level
    response = ollama.generate(
        model='gemma3:12b',
        prompt="What's this? Provide a description without leading or trailing text.",
        images=[img_base64],  # Pass base64 encoded image data at top level
        options={"temperature": 0.1}  # Lower temperature for more consistent output
    )
    
    # Extract the caption from the response
    caption = response['response'].strip()

pnmartinez avatar Mar 29 '25 10:03 pnmartinez

This works for me: https://medium.com/p/bb4696663701

myyim avatar Apr 01 '25 14:04 myyim

@pnmartinez @myyim does the example https://github.com/ollama/ollama-python/blob/main/examples/multimodal-chat.py not work for you guys?

ParthSareen avatar Apr 01 '25 16:04 ParthSareen

Just found out in my own testing, if you want to call it against the Ollama object's chat function, you need to pass it as an "ImageDocument"

llm = Ollama(model="gemma3", request_timeout=360.0) response = llm.chat([ChatMessage("What's this? Provide a description without leading or trailing text.", additional_kwargs={"images": [ImageDocument(image_path=image_path)]})])

garricklw avatar Apr 07 '25 20:04 garricklw