ollama-python
ollama-python copied to clipboard
How to send image to vision model
Please tell me the correct way to send an image to the vision model.
this is my function:
def generate_image_description(image_path): prompt = f"Describe the content of this image: {image_path}." response = client.chat(model='llava-phi3:3.8b', messages=[ { 'role': 'user', 'content': prompt, }, ]) return response['message']['content']
Please refer to the definition of a "chat message" in the python code Message Type Dict.
The image can be passed in using the "images" key in your message dictionary. The "images" key is a sequence of "bytes" or "path-like str".
Here is an example:
import ollama
response = ollama.chat(
model="moondream",
messages=[
{"role": "user", "content": "Describe the image", "images": ["./cat.jpeg"]}
],
)
print(response["message"]['role'])
print(response["message"]['content'])
Just my grain of salt for future searches: for gemma3, this is what worked for me using the Python ollama lib:
for img_path in paths:
try:
# Read the image file as binary data
with open(img_path, 'rb') as img_file:
img_data = img_file.read()
# Convert image to base64 for Ollama
img_base64 = base64.b64encode(img_data).decode('utf-8')
# Use the correct approach as per Gemma 3 documentation - images at top level
response = ollama.generate(
model='gemma3:12b',
prompt="What's this? Provide a description without leading or trailing text.",
images=[img_base64], # Pass base64 encoded image data at top level
options={"temperature": 0.1} # Lower temperature for more consistent output
)
# Extract the caption from the response
caption = response['response'].strip()
This works for me: https://medium.com/p/bb4696663701
@pnmartinez @myyim does the example https://github.com/ollama/ollama-python/blob/main/examples/multimodal-chat.py not work for you guys?
Just found out in my own testing, if you want to call it against the Ollama object's chat function, you need to pass it as an "ImageDocument"
llm = Ollama(model="gemma3", request_timeout=360.0)
response = llm.chat([ChatMessage("What's this? Provide a description without leading or trailing text.", additional_kwargs={"images": [ImageDocument(image_path=image_path)]})])