dspy icon indicating copy to clipboard operation
dspy copied to clipboard

[Bug] Ollama requests fail when including an Image

Open kmeehl opened this issue 8 months ago • 6 comments

What happened?

Hi! Very interesting project.

I'm trying to use dspy to do image classification. As a first step, I'd like to just generate a description of an image, using minicpm-v model on ollama.

The following example results in an error:

import dspy

class Describe(dspy.Signature):
    """Describe the image in detail. Respond only in English."""

    image: dspy.Image = dspy.InputField(desc="A photo")
    description: str = dspy.OutputField(desc="Detailed description of the image.")

image_path="/tmp/9221487.jpg"
minicpm = dspy.LM('ollama/minicpm-v:latest', api_base='http://localhost:11434', api_key='')

p = dspy.Predict(Describe)
p.set_lm(minicpm)
result = p(image=dspy.Image.from_url(image_path))
print(result.description)

Output:

2025/04/13 13:22:40 WARNING dspy.adapters.json_adapter: Failed to use structured output format. Falling back to JSON mode. Error: litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': 'Your input fields are:\n1. `image` (Image): A photo\nYour output fields are:\n1. `description` (str): Detailed description of the image.\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\nInputs will have the following structure:\n\n[[ ## image ## ]]\n{image}\n\nOutputs will be a JSON object with the following fields.\n\n[[ ## description ## ]]\n{description}\nIn adhering to this structure, your objective is: \n        Describe the image in detail. Respond only in English.'}
Traceback (most recent call last):
  File "~/.local/lib/python3.12/site-packages/dspy/adapters/chat_adapter.py", line 41, in __call__
    return super().__call__(lm, lm_kwargs, signature, demos, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/adapters/base.py", line 33, in __call__
    outputs = lm(messages=inputs, **lm_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/utils/callback.py", line 266, in wrapper
    return fn(instance, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/base_lm.py", line 52, in __call__
    response = self.forward(prompt=prompt, messages=messages, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/utils/callback.py", line 266, in wrapper
    return fn(instance, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 112, in forward
    results = completion(
              ^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 268, in wrapper
    output = func_cached(key, request, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/cachetools/_decorators.py", line 94, in wrapper
    v = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 257, in func_cached
    return func(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 282, in cached_litellm_completion
    return litellm_completion(
           ^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 301, in litellm_completion
    return litellm.completion(
           ^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/litellm/utils.py", line 1213, in wrapper
    raise e
  File "~/.local/lib/python3.12/site-packages/litellm/utils.py", line 1091, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/litellm/main.py", line 3093, in completion
    raise exception_type(
  File "~/.local/lib/python3.12/site-packages/litellm/main.py", line 2815, in completion
    response = base_llm_http_handler.completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/litellm/llms/custom_httpx/llm_http_handler.py", line 239, in completion
    data = provider_config.transform_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/litellm/llms/ollama/completion/transformation.py", line 315, in transform_request
    modified_prompt = ollama_pt(model=model, messages=messages)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/litellm/litellm_core_utils/prompt_templates/factory.py", line 265, in ollama_pt
    raise litellm.BadRequestError(
litellm.exceptions.BadRequestError: litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': 'Your input fields are:\n1. `image` (Image): A photo\nYour output fields are:\n1. `description` (str): Detailed description of the image.\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\n[[ ## image ## ]]\n{image}\n\n[[ ## description ## ]]\n{description}\n\n[[ ## completed ## ]]\nIn adhering to this structure, your objective is: \n        Describe the image in detail. Respond only in English.'}


.
.
.

Traceback (most recent call last):
  File "~/.local/lib/python3.12/site-packages/dspy/adapters/json_adapter.py", line 67, in __call__
    return super().__call__(lm, lm_kwargs, signature, demos, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/adapters/chat_adapter.py", line 49, in __call__
    raise e
  File "~/.local/lib/python3.12/site-packages/dspy/adapters/chat_adapter.py", line 41, in __call__
    return super().__call__(lm, lm_kwargs, signature, demos, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/adapters/base.py", line 33, in __call__
    outputs = lm(messages=inputs, **lm_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/utils/callback.py", line 266, in wrapper
    return fn(instance, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/base_lm.py", line 52, in __call__
    response = self.forward(prompt=prompt, messages=messages, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/utils/callback.py", line 266, in wrapper
    return fn(instance, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 112, in forward
    results = completion(
              ^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 268, in wrapper
    output = func_cached(key, request, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/cachetools/_decorators.py", line 94, in wrapper
    v = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 257, in func_cached
    return func(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 282, in cached_litellm_completion
    return litellm_completion(
           ^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/clients/lm.py", line 301, in litellm_completion
    return litellm.completion(
           ^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/litellm/utils.py", line 1213, in wrapper
    raise e
  File "~/.local/lib/python3.12/site-packages/litellm/utils.py", line 1091, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/litellm/main.py", line 3093, in completion
    raise exception_type(
  File "~/.local/lib/python3.12/site-packages/litellm/main.py", line 2815, in completion
    response = base_llm_http_handler.completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/litellm/llms/custom_httpx/llm_http_handler.py", line 239, in completion
    data = provider_config.transform_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/litellm/llms/ollama/completion/transformation.py", line 315, in transform_request
    modified_prompt = ollama_pt(model=model, messages=messages)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/litellm/litellm_core_utils/prompt_templates/factory.py", line 265, in ollama_pt
    raise litellm.BadRequestError(
litellm.exceptions.BadRequestError: litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': 'Your input fields are:\n1. `image` (Image): A photo\nYour output fields are:\n1. `description` (str): Detailed description of the image.\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\nInputs will have the following structure:\n\n[[ ## image ## ]]\n{image}\n\nOutputs will be a JSON object with the following fields.\n\n[[ ## description ## ]]\n{description}\nIn adhering to this structure, your objective is: \n        Describe the image in detail. Respond only in English.'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/projects/1central/image_classifier/image.py", line 32, in <module>
    result = p(image=dspy.Image.from_url(image_path))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/utils/callback.py", line 266, in wrapper
    return fn(instance, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/predict/predict.py", line 77, in __call__
    return self.forward(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/predict/predict.py", line 107, in forward
    completions = adapter(
                  ^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/adapters/chat_adapter.py", line 50, in __call__
    return JSONAdapter()(lm, lm_kwargs, signature, demos, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/dspy/adapters/json_adapter.py", line 69, in __call__
    raise RuntimeError(
RuntimeError: Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': 'Your input fields are:\n1. `image` (Image): A photo\nYour output fields are:\n1. `description` (str): Detailed description of the image.\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\nInputs will have the following structure:\n\n[[ ## image ## ]]\n{image}\n\nOutputs will be a JSON object with the following fields.\n\n[[ ## description ## ]]\n{description}\nIn adhering to this structure, your objective is: \n        Describe the image in detail. Respond only in English.'}

I've also tried the same code, replacing the model provider with ollama_chat: minicpm = dspy.LM('ollama_chat/minicpm-v:latest', api_base='http://localhost:11434', api_key='')

This results in the same error.

Questions

  • Are there any known limitations to dspy when using local LLMs with Ollama?
  • Are there additional configurations, or alternate strategies I should try?
  • Any tips or directions you can point me in for debugging this?

Additional Info ollama version: 0.6.5 dspy version: 2.6.17

Steps to reproduce

Copy and run the example code above, changing image_path to point to a real image on your hard drive.

DSPy version

2.6.17

kmeehl avatar Apr 13 '25 17:04 kmeehl

Try dspy.LM('ollama_chat/...') ?

okhat avatar Apr 14 '25 14:04 okhat

Hi @okhat , I tried that. That's the second error output that I added above.

I actually thought they were different errors, but upon running it again, they appear to be the same. I'll edit my post to reflect that.

kmeehl avatar Apr 14 '25 17:04 kmeehl

hey @kmeehl , do you get this error for all your datapoints, or does it only happen on some?

I believe most multimodal LLMs are not well adapted for structured JSON outputting (from our newly-updated JsonAdapter) all the time, so that's what's triggering the error. (I've been noticing this on meta-llama/Llama-3.2-11B-Vision-Instruct as well).

You can bypass this to ensure runs aren't halted by setting max_errors in dspy.Evaluate or any of the optimizer initialization configs to a high value. Ideally, applying optimizers will take care of this or at least minimize how many examples fail.

lmk if this helps!

arnavsinghvi11 avatar Apr 22 '25 17:04 arnavsinghvi11

Hey @kmeehl I also had similar issues, for me a working solution is the following:

import dspy

class Describe(dspy.Signature):
    """Describe the image in detail. Respond only in English."""

    image: dspy.Image = dspy.InputField(desc="A photo")
    description: str = dspy.OutputField(desc="Detailed description of the image.")

image_path="image.png"
minicpm = dspy.LM('openai/gemma3:12b-it-qat', base_url='http://localhost:11454/v1', api_key='ollama', cache=False)

p = dspy.Predict(Describe)
p.set_lm(minicpm)
result = p(image=dspy.Image.from_url(image_path))
print(result.description)

Hope this works—don’t forget to swap in your own image path, model name, and port!

carvalho28 avatar Apr 22 '25 23:04 carvalho28

Thanks for the responses!

Hey @okhat , 'ollama_chat/...' results in the following error: Client error '400 Bad Request' for url 'http://localhost:11434/api/chat'

Hey @carvalho28, I haven't tried a ton of different LLMs, but I have yet to see it work on any of the ones that I have tried.

Hey @carvalho28, I gave your solution a try, but it results in the same error: litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': 'Your input fields are:\n1. `image`...'

I have been able to get dspy talking to my LLM via ollama by bypassing the "standard" dspy way of doing it:

 minicpm = dspy.LM('ollama_chat/minicpm-v:latest', api_base='http://localhost:11434', api_key='')
img = Util.image_base64_uri(image_base_path)
image_detail_prompt = "Describe the image in detail. Respond only in English."
messages = [{"role": "user", "content": [{"type": "text", "text": image_detail_prompt}, {"type": "image_url", "image_url": {"url": img}} ]}]
detail = minicpm(messages=messages)
print(detail)

I believe this works because messages is formatted differently. Specifically, content is not a string, but an array of json objects. The error I'm getting from dspy shows that dspy is constructing content as just a string.

kmeehl avatar Jun 05 '25 20:06 kmeehl

An easy way to debug is turning on MLflow tracing: https://dspy.ai/tutorials/observability/.

The error I'm getting from dspy shows that dspy is constructing content as just a string.

In our latest code, we are formatting image content as a list. Since you have made the code work, this is completely optional - would you mind checking if installing from the latest DSPy still produces the error?

chenmoneygithub avatar Jun 11 '25 02:06 chenmoneygithub

@carvalho28 solution above works better. I used it for this image extraction task: https://github.com/adsharma/kuzu-demo-dspy/commit/fc30bfd1d7803676eb75611317f5d3dbfb7a4507

Things that could be better:

  • If I use gemma3n:e4b, the model hallucinates and gives me random answers not in the image.
  • dspy could detect this and fail early instead of burning tokens and time by passing base64 encoded image to a model that doesn't support it.
  • For this image, qwen2.5vl:7b fails to get all all the rows in column1. The extracted table contains only "Heart Rhythm Problems". But it does contain all the drugs and their side effects (even though they're unnecessarily duplicated).
  • Given this partially correct table, dspy goes into an infinite loop trying to extract List[ConditionAndDrug]. Perhaps I have to switch to a stronger non-vision model for that task.

dspy + qwen2.5vl are also sensitive to the exact prompt ("Describe the image in detail. Respond only in English"). Editing it leads to unpredictable results.

Problems to fix:

  • Support image extraction via openai as awell as ollama_chat model configs
  • Detect models that don't support image extraction
  • Prompt compilation tweaks to make the extraction more robust.

adsharma avatar Jun 30 '25 20:06 adsharma

To summarize: when we specify ollama_chat/<model> or ollama/<model>, the messages content JSON is not understood by the Ollama REST API. A workaround is to use the OpenAI mode and point to the Ollama server, as shown below, because the Ollama OpenAI compatibility API understands this format:

      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "[[ ## image ## ]]\\n"
        },
        {
          "type": "image_url",
          "image_url": { "url": "..." }
        }
...

Here is a way to reproduce:

# Setup
import dspy
from PIL import Image

image = Image.open("/path/to/your/image.jpg")  # Update with your image path
image.thumbnail((1024, 1024), Image.LANCZOS)

# Using ollama_chat will fail
lm = dspy.LM('ollama_chat/gemma3:12b', api_base='http://localhost:11434', api_key='')
dspy.configure(lm=lm)
p = dspy.Predict("image: dspy.Image -> description: str")(image=dspy.Image.from_PIL(image))

# Using openai will succeed
lm = dspy.LM('openai/gemma3:12b', api_base='http://localhost:11434/v1', api_key='')
dspy.configure(lm=lm)
p = dspy.Predict("image: dspy.Image -> description: str")(image=dspy.Image.from_PIL(image))

jeffg-dev avatar Jul 17 '25 17:07 jeffg-dev

Ollama backend expects a simplified format. Something like this: { "role": "user", "content": "Describe this image: ![image](path_or_url)" }

So this code works:

vllm = dspy.LM('ollama_chat/qwen2.5vl:7b', api_base='http://localhost:11434', api_key='')
image_prompt = f"Describe in detail this image? ![image]({local_file_path})"

with dspy.context(lm=vllm):
    mm = dspy.Predict("image_prompt:str -> answer:str")
    print(mm(image_prompt=image_prompt))

So this could be a bug in the way dspy formats the input (OpenAI style) for Ollama.

sujeetv avatar Jul 23 '25 01:07 sujeetv

Recipe to Make It Work

I've checked if it could be fixed in DSPy, and it seems it's not a good idea.

What I find appropriate to do is to check the api_base and add /v1 automatically. However, this currently doesn't work because of LiteLLM (see details below).

I'm going to look for a relevant issue in the LiteLLM GitHub repo or create a new one.

My test case

I'm not using structured output to isolate the image issue. The output is a simple string, not a pydantic model.

def test_image_description() -> None:
    pil_image = Image.open("tests/img/image.jpg")
    dspy_image = dspy.Image.from_PIL(pil_image)

    lm = dspy.LM(
        "ollama_chat/llama3.2-vision:11b", 
        api_base="http://localhost:11434/v1",
        api_key="",
    )
    dspy.configure(lm=lm)

    predict = dspy.Predict("image -> description: str")
    description = predict(image=dspy_image)
    assert len(description) > 0

Root cause

The Ollama native API uses a different message format:

messages=[
    {
        "role": "user",
        "content": "Describe this image:",
        "images": ["<image base64>"]
    }
]

Whereas OpenAI uses the following format:

messages=[
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image:"}
            {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<image base64>"}}
        ],
    }
]

The OpenAI format is used everywhere in DSPy and LiteLLM.

Adapter on DSPy side

I tried to implement an adapter in dspy/clients/lm.py:litellm_completion to convert OpenAI messages into the format expected by Ollama.

However LiteLLM fails to count prompt tokens here: https://github.com/BerriAI/litellm/blob/33510120fdbc4a47122d615dc3b95426e12d54df/litellm/llms/ollama/chat/transformation.py#L378

litellm.token_counter expects OpenAI message format.

So, I see no option to fix it just in DSPy. Specifying ollama OpenAI compatibility mode looks the best solution to me.

Using Ollama OpenAI Compatibility Mode

lm = dspy.LM(
    "ollama_chat/llama3.2-vision:11b", 
    api_base="http://localhost:11434/v1",
    api_key="",
)

Unfortunately, this also doesn't work. This happens because of LiteLLM. DSPy delegates all LLM calls to this framework (llm_http_handler.py).

And LiteLLM builds an incorrect URL: https://github.com/BerriAI/litellm/blob/main/litellm/llms/ollama/chat/transformation.py#L250

It should be http://localhost:11434/v1/chat/completions instead of http://localhost:11434/v1/api/chat

and here is the fix

if api_base is None:
    api_base = "http://localhost:11434"
if api_base.endswith("/api/chat"):
    url = api_base
+elif api_base.endswith("/v1"):
+   url = f"{api_base}/chat/completions"
else:
    url = f"{api_base}/api/chat"

Also, one modification is needed in transform_response due to the different output format.

And with that, I made it work.

dmittov avatar Jul 29 '25 18:07 dmittov

transform_response

What is the remaining change to be made to transform_response to make this approach work? @dmittov

Mahgoobi avatar Aug 19 '25 16:08 Mahgoobi

I found a workaround: use the openai compatible endpoints by prefixing openai instead of ollama_chat:

lm = dspy.LM("openai/qwen2.5vl:7b", api_base="http://localhost:11434/v1", api_key="not-needed-but-cannot-be-empty")
test_image = dspy.Image.from_file("data/test_image.png")
with dspy.context(lm=lm):
  predict = dspy.Predict("image -> description: str")
  d = predict(image=test_image)

I have not checked if the LM usage is accurate or whether structured outputs will work, but this seems to be a simple temporary solution.

sontanon avatar Aug 23 '25 19:08 sontanon