[Bug] Ollama requests fail when including an Image
What happened?
Hi! Very interesting project.
I'm trying to use dspy to do image classification. As a first step, I'd like to just generate a description of an image, using minicpm-v model on ollama.
The following example results in an error:
import dspy
class Describe(dspy.Signature):
"""Describe the image in detail. Respond only in English."""
image: dspy.Image = dspy.InputField(desc="A photo")
description: str = dspy.OutputField(desc="Detailed description of the image.")
image_path="/tmp/9221487.jpg"
minicpm = dspy.LM('ollama/minicpm-v:latest', api_base='http://localhost:11434', api_key='')
p = dspy.Predict(Describe)
p.set_lm(minicpm)
result = p(image=dspy.Image.from_url(image_path))
print(result.description)
Output:
2025/04/13 13:22:40 WARNING dspy.adapters.json_adapter: Failed to use structured output format. Falling back to JSON mode. Error: litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': 'Your input fields are:\n1. `image` (Image): A photo\nYour output fields are:\n1. `description` (str): Detailed description of the image.\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\nInputs will have the following structure:\n\n[[ ## image ## ]]\n{image}\n\nOutputs will be a JSON object with the following fields.\n\n[[ ## description ## ]]\n{description}\nIn adhering to this structure, your objective is: \n Describe the image in detail. Respond only in English.'}
Traceback (most recent call last):
File "~/.local/lib/python3.12/site-packages/dspy/adapters/chat_adapter.py", line 41, in __call__
return super().__call__(lm, lm_kwargs, signature, demos, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/adapters/base.py", line 33, in __call__
outputs = lm(messages=inputs, **lm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/utils/callback.py", line 266, in wrapper
return fn(instance, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/base_lm.py", line 52, in __call__
response = self.forward(prompt=prompt, messages=messages, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/utils/callback.py", line 266, in wrapper
return fn(instance, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 112, in forward
results = completion(
^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 268, in wrapper
output = func_cached(key, request, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/cachetools/_decorators.py", line 94, in wrapper
v = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 257, in func_cached
return func(request, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 282, in cached_litellm_completion
return litellm_completion(
^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 301, in litellm_completion
return litellm.completion(
^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/litellm/utils.py", line 1213, in wrapper
raise e
File "~/.local/lib/python3.12/site-packages/litellm/utils.py", line 1091, in wrapper
result = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/litellm/main.py", line 3093, in completion
raise exception_type(
File "~/.local/lib/python3.12/site-packages/litellm/main.py", line 2815, in completion
response = base_llm_http_handler.completion(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/litellm/llms/custom_httpx/llm_http_handler.py", line 239, in completion
data = provider_config.transform_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/litellm/llms/ollama/completion/transformation.py", line 315, in transform_request
modified_prompt = ollama_pt(model=model, messages=messages)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/litellm/litellm_core_utils/prompt_templates/factory.py", line 265, in ollama_pt
raise litellm.BadRequestError(
litellm.exceptions.BadRequestError: litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': 'Your input fields are:\n1. `image` (Image): A photo\nYour output fields are:\n1. `description` (str): Detailed description of the image.\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\n[[ ## image ## ]]\n{image}\n\n[[ ## description ## ]]\n{description}\n\n[[ ## completed ## ]]\nIn adhering to this structure, your objective is: \n Describe the image in detail. Respond only in English.'}
.
.
.
Traceback (most recent call last):
File "~/.local/lib/python3.12/site-packages/dspy/adapters/json_adapter.py", line 67, in __call__
return super().__call__(lm, lm_kwargs, signature, demos, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/adapters/chat_adapter.py", line 49, in __call__
raise e
File "~/.local/lib/python3.12/site-packages/dspy/adapters/chat_adapter.py", line 41, in __call__
return super().__call__(lm, lm_kwargs, signature, demos, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/adapters/base.py", line 33, in __call__
outputs = lm(messages=inputs, **lm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/utils/callback.py", line 266, in wrapper
return fn(instance, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/base_lm.py", line 52, in __call__
response = self.forward(prompt=prompt, messages=messages, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/utils/callback.py", line 266, in wrapper
return fn(instance, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 112, in forward
results = completion(
^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 268, in wrapper
output = func_cached(key, request, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/cachetools/_decorators.py", line 94, in wrapper
v = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 257, in func_cached
return func(request, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 282, in cached_litellm_completion
return litellm_completion(
^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 301, in litellm_completion
return litellm.completion(
^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/litellm/utils.py", line 1213, in wrapper
raise e
File "~/.local/lib/python3.12/site-packages/litellm/utils.py", line 1091, in wrapper
result = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/litellm/main.py", line 3093, in completion
raise exception_type(
File "~/.local/lib/python3.12/site-packages/litellm/main.py", line 2815, in completion
response = base_llm_http_handler.completion(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/litellm/llms/custom_httpx/llm_http_handler.py", line 239, in completion
data = provider_config.transform_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/litellm/llms/ollama/completion/transformation.py", line 315, in transform_request
modified_prompt = ollama_pt(model=model, messages=messages)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/litellm/litellm_core_utils/prompt_templates/factory.py", line 265, in ollama_pt
raise litellm.BadRequestError(
litellm.exceptions.BadRequestError: litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': 'Your input fields are:\n1. `image` (Image): A photo\nYour output fields are:\n1. `description` (str): Detailed description of the image.\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\nInputs will have the following structure:\n\n[[ ## image ## ]]\n{image}\n\nOutputs will be a JSON object with the following fields.\n\n[[ ## description ## ]]\n{description}\nIn adhering to this structure, your objective is: \n Describe the image in detail. Respond only in English.'}
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "~/projects/1central/image_classifier/image.py", line 32, in <module>
result = p(image=dspy.Image.from_url(image_path))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/utils/callback.py", line 266, in wrapper
return fn(instance, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/predict/predict.py", line 77, in __call__
return self.forward(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/predict/predict.py", line 107, in forward
completions = adapter(
^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/adapters/chat_adapter.py", line 50, in __call__
return JSONAdapter()(lm, lm_kwargs, signature, demos, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.local/lib/python3.12/site-packages/dspy/adapters/json_adapter.py", line 69, in __call__
raise RuntimeError(
RuntimeError: Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': 'Your input fields are:\n1. `image` (Image): A photo\nYour output fields are:\n1. `description` (str): Detailed description of the image.\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\nInputs will have the following structure:\n\n[[ ## image ## ]]\n{image}\n\nOutputs will be a JSON object with the following fields.\n\n[[ ## description ## ]]\n{description}\nIn adhering to this structure, your objective is: \n Describe the image in detail. Respond only in English.'}
I've also tried the same code, replacing the model provider with ollama_chat:
minicpm = dspy.LM('ollama_chat/minicpm-v:latest', api_base='http://localhost:11434', api_key='')
This results in the same error.
Questions
- Are there any known limitations to dspy when using local LLMs with Ollama?
- Are there additional configurations, or alternate strategies I should try?
- Any tips or directions you can point me in for debugging this?
Additional Info ollama version: 0.6.5 dspy version: 2.6.17
Steps to reproduce
Copy and run the example code above, changing image_path to point to a real image on your hard drive.
DSPy version
2.6.17
Try dspy.LM('ollama_chat/...') ?
Hi @okhat , I tried that. That's the second error output that I added above.
I actually thought they were different errors, but upon running it again, they appear to be the same. I'll edit my post to reflect that.
hey @kmeehl , do you get this error for all your datapoints, or does it only happen on some?
I believe most multimodal LLMs are not well adapted for structured JSON outputting (from our newly-updated JsonAdapter) all the time, so that's what's triggering the error. (I've been noticing this on meta-llama/Llama-3.2-11B-Vision-Instruct as well).
You can bypass this to ensure runs aren't halted by setting max_errors in dspy.Evaluate or any of the optimizer initialization configs to a high value. Ideally, applying optimizers will take care of this or at least minimize how many examples fail.
lmk if this helps!
Hey @kmeehl I also had similar issues, for me a working solution is the following:
import dspy
class Describe(dspy.Signature):
"""Describe the image in detail. Respond only in English."""
image: dspy.Image = dspy.InputField(desc="A photo")
description: str = dspy.OutputField(desc="Detailed description of the image.")
image_path="image.png"
minicpm = dspy.LM('openai/gemma3:12b-it-qat', base_url='http://localhost:11454/v1', api_key='ollama', cache=False)
p = dspy.Predict(Describe)
p.set_lm(minicpm)
result = p(image=dspy.Image.from_url(image_path))
print(result.description)
Hope this works—don’t forget to swap in your own image path, model name, and port!
Thanks for the responses!
Hey @okhat , 'ollama_chat/...' results in the following error:
Client error '400 Bad Request' for url 'http://localhost:11434/api/chat'
Hey @carvalho28, I haven't tried a ton of different LLMs, but I have yet to see it work on any of the ones that I have tried.
Hey @carvalho28, I gave your solution a try, but it results in the same error:
litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': 'Your input fields are:\n1. `image`...'
I have been able to get dspy talking to my LLM via ollama by bypassing the "standard" dspy way of doing it:
minicpm = dspy.LM('ollama_chat/minicpm-v:latest', api_base='http://localhost:11434', api_key='')
img = Util.image_base64_uri(image_base_path)
image_detail_prompt = "Describe the image in detail. Respond only in English."
messages = [{"role": "user", "content": [{"type": "text", "text": image_detail_prompt}, {"type": "image_url", "image_url": {"url": img}} ]}]
detail = minicpm(messages=messages)
print(detail)
I believe this works because messages is formatted differently. Specifically, content is not a string, but an array of json objects. The error I'm getting from dspy shows that dspy is constructing content as just a string.
An easy way to debug is turning on MLflow tracing: https://dspy.ai/tutorials/observability/.
The error I'm getting from dspy shows that dspy is constructing content as just a string.
In our latest code, we are formatting image content as a list. Since you have made the code work, this is completely optional - would you mind checking if installing from the latest DSPy still produces the error?
@carvalho28 solution above works better. I used it for this image extraction task: https://github.com/adsharma/kuzu-demo-dspy/commit/fc30bfd1d7803676eb75611317f5d3dbfb7a4507
Things that could be better:
- If I use
gemma3n:e4b, the model hallucinates and gives me random answers not in the image. - dspy could detect this and fail early instead of burning tokens and time by passing base64 encoded image to a model that doesn't support it.
- For this image,
qwen2.5vl:7bfails to get all all the rows in column1. The extracted table contains only "Heart Rhythm Problems". But it does contain all the drugs and their side effects (even though they're unnecessarily duplicated). - Given this partially correct table, dspy goes into an infinite loop trying to extract
List[ConditionAndDrug]. Perhaps I have to switch to a stronger non-vision model for that task.
dspy + qwen2.5vl are also sensitive to the exact prompt ("Describe the image in detail. Respond only in English"). Editing it leads to unpredictable results.
Problems to fix:
- Support image extraction via
openaias awell asollama_chatmodel configs - Detect models that don't support image extraction
- Prompt compilation tweaks to make the extraction more robust.
To summarize: when we specify ollama_chat/<model> or ollama/<model>, the messages content JSON is not understood by the Ollama REST API. A workaround is to use the OpenAI mode and point to the Ollama server, as shown below, because the Ollama OpenAI compatibility API understands this format:
"role": "user",
"content": [
{
"type": "text",
"text": "[[ ## image ## ]]\\n"
},
{
"type": "image_url",
"image_url": { "url": "..." }
}
...
Here is a way to reproduce:
# Setup
import dspy
from PIL import Image
image = Image.open("/path/to/your/image.jpg") # Update with your image path
image.thumbnail((1024, 1024), Image.LANCZOS)
# Using ollama_chat will fail
lm = dspy.LM('ollama_chat/gemma3:12b', api_base='http://localhost:11434', api_key='')
dspy.configure(lm=lm)
p = dspy.Predict("image: dspy.Image -> description: str")(image=dspy.Image.from_PIL(image))
# Using openai will succeed
lm = dspy.LM('openai/gemma3:12b', api_base='http://localhost:11434/v1', api_key='')
dspy.configure(lm=lm)
p = dspy.Predict("image: dspy.Image -> description: str")(image=dspy.Image.from_PIL(image))
Ollama backend expects a simplified format. Something like this:
{ "role": "user", "content": "Describe this image: " }
So this code works:
vllm = dspy.LM('ollama_chat/qwen2.5vl:7b', api_base='http://localhost:11434', api_key='')
image_prompt = f"Describe in detail this image? "
with dspy.context(lm=vllm):
mm = dspy.Predict("image_prompt:str -> answer:str")
print(mm(image_prompt=image_prompt))
So this could be a bug in the way dspy formats the input (OpenAI style) for Ollama.
Recipe to Make It Work
I've checked if it could be fixed in DSPy, and it seems it's not a good idea.
What I find appropriate to do is to check the api_base and add /v1 automatically.
However, this currently doesn't work because of LiteLLM (see details below).
I'm going to look for a relevant issue in the LiteLLM GitHub repo or create a new one.
My test case
I'm not using structured output to isolate the image issue. The output is a simple string, not a pydantic model.
def test_image_description() -> None:
pil_image = Image.open("tests/img/image.jpg")
dspy_image = dspy.Image.from_PIL(pil_image)
lm = dspy.LM(
"ollama_chat/llama3.2-vision:11b",
api_base="http://localhost:11434/v1",
api_key="",
)
dspy.configure(lm=lm)
predict = dspy.Predict("image -> description: str")
description = predict(image=dspy_image)
assert len(description) > 0
Root cause
The Ollama native API uses a different message format:
messages=[
{
"role": "user",
"content": "Describe this image:",
"images": ["<image base64>"]
}
]
Whereas OpenAI uses the following format:
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image:"}
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<image base64>"}}
],
}
]
The OpenAI format is used everywhere in DSPy and LiteLLM.
Adapter on DSPy side
I tried to implement an adapter in dspy/clients/lm.py:litellm_completion
to convert OpenAI messages into the format expected by Ollama.
However LiteLLM fails to count prompt tokens here:
https://github.com/BerriAI/litellm/blob/33510120fdbc4a47122d615dc3b95426e12d54df/litellm/llms/ollama/chat/transformation.py#L378
litellm.token_counter expects OpenAI message format.
So, I see no option to fix it just in DSPy. Specifying ollama OpenAI compatibility mode looks the best solution to me.
Using Ollama OpenAI Compatibility Mode
lm = dspy.LM(
"ollama_chat/llama3.2-vision:11b",
api_base="http://localhost:11434/v1",
api_key="",
)
Unfortunately, this also doesn't work. This happens because of LiteLLM. DSPy delegates all LLM calls to this framework (llm_http_handler.py).
And LiteLLM builds an incorrect URL: https://github.com/BerriAI/litellm/blob/main/litellm/llms/ollama/chat/transformation.py#L250
It should be
http://localhost:11434/v1/chat/completions
instead of
http://localhost:11434/v1/api/chat
and here is the fix
if api_base is None:
api_base = "http://localhost:11434"
if api_base.endswith("/api/chat"):
url = api_base
+elif api_base.endswith("/v1"):
+ url = f"{api_base}/chat/completions"
else:
url = f"{api_base}/api/chat"
Also, one modification is needed in transform_response due to the different output format.
And with that, I made it work.
transform_response
What is the remaining change to be made to transform_response to make this approach work? @dmittov
I found a workaround: use the openai compatible endpoints by prefixing openai instead of ollama_chat:
lm = dspy.LM("openai/qwen2.5vl:7b", api_base="http://localhost:11434/v1", api_key="not-needed-but-cannot-be-empty")
test_image = dspy.Image.from_file("data/test_image.png")
with dspy.context(lm=lm):
predict = dspy.Predict("image -> description: str")
d = predict(image=test_image)
I have not checked if the LM usage is accurate or whether structured outputs will work, but this seems to be a simple temporary solution.