holmesgpt Ollama deepseek-r1 or qwen3 not work

What happened?

How to solve the issue of models not found in litellm's model list?

holmes version 0.16.0

ollama list NAME ID SIZE MODIFIED
deepseek-r1:8b 6995872bfe4c 5.2 GB 28 minutes ago
qwen3:4b 359d7dd4bcda 2.5 GB About an hour ago
llama3:8b 365c0bd3c000 4.7 GB 2 months ago

holmes ask "what pods are failing?" --model='ollama/deepseek-r1:8b' Loaded models: ['ollama/deepseek-r1:8b']
✅ Toolset kubernetes/kube-prometheus-stack
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset datadog/rds
✅ Toolset bash
✅ Toolset runbook
✅ Toolset kubernetes/logs
✅ Toolset kubernetes/core
✅ Toolset helm/core
✅ Toolset kubernetes/live-metrics
Using 43 datasources (toolsets). To refresh: use flag --refresh-toolsets
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN
environment variable to the correct value for your model.
Using model: ollama/deepseek-r1:8b (200,000 total tokens, 40,000 output tokens)
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands. User: what pods are failing?

Thinking...

Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN
environment variable to the correct value for your model.

What did you expect to happen?

How to solve this issue

How can we reproduce it (as minimally and precisely as possible)?

How to solve this issue

Anything else we need to know?

No response

Oct 27 '25 00:10 skb888

Hi @skb888, thanks for the report. We will take a look and get back to you.

Oct 29 '25 09:10 moshemorad

Hi @skb888 can you try making a file called model_list.yaml with the contents

deepseek-r1:
  api_base: PUT_API_BASE_HERE
  api_key: PUT_API_KEY_HERE # if there is none just put here ""
  model: ollama/deepseek-r1:8b

then run

export MODEL_LIST_FILE_LOCATION=/PATH/TO/LIST/model_list.yaml
holmes ask "what pods are failing?" --model='ollama/deepseek-r1:8b'

And share the output?

Also from my experience not all deepseek-r1 models support tool_calling/function_calling its worth verifying this model is supports that feature too

Oct 30 '25 08:10 Avi-Robusta

Thanks for the suggestion. I have tried created model_list.yaml and then run below. It still not works. Meanwhile, I have tried qwen3:4b model and faced the same issues.

holmes ask "what pods are failing?" --model='ollama/deepseek-r1:8b'

Loaded models: ['deepseek-r1', 'ollama/deepseek-r1:8b']
✅ Toolset kubernetes/kube-prometheus-stack
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset datadog/rds
✅ Toolset bash
✅ Toolset runbook
✅ Toolset kubernetes/logs
✅ Toolset kubernetes/core
✅ Toolset helm/core
✅ Toolset kubernetes/live-metrics
Using 43 datasources (toolsets). To refresh: use flag --refresh-toolsets
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN
environment variable to the correct value for your model.
Using model: ollama/deepseek-r1:8b (200,000 total tokens, 40,000 output tokens)
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands. User: what pods are failing?

Thinking...

Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN
environment variable to the correct value for your model.

Meanwhile, I try the llama3:8b model and also fail due to the limited context window size for this model. Any suggestion about how to limit system_prompt and user_prompt size? I do not find how to define ENV variables in the doc.

holmes ask "what pods are failing?" --model='ollama/llama3:8b'
Loaded models: ['ollama/llama3:8b']
✅ Toolset kubernetes/kube-prometheus-stack
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset datadog/rds
✅ Toolset bash
✅ Toolset runbook
✅ Toolset kubernetes/logs
✅ Toolset helm/core
✅ Toolset kubernetes/core
✅ Toolset kubernetes/live-metrics
Using 43 datasources (toolsets). To refresh: use flag --refresh-toolsets
Using model: ollama/llama3:8b (8,192 total tokens, 1,638 output tokens)
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands. User: what pods are failing?

Thinking...

The combined size of system_prompt and user_prompt (8848 tokens) exceeds the model's context window for input.
An error occurred during interactive mode:
Traceback (most recent call last):
File "holmes/interactive.py", line 1222, in run_interactive_loop
File "sentry_sdk/tracing_utils.py", line 851, in sync_wrapper
File "holmes/core/tool_calling_llm.py", line 329, in call
File "sentry_sdk/tracing_utils.py", line 851, in sync_wrapper
File "holmes/core/truncation/input_context_window_limiter.py", line 196, in limit_input_context_window
File "holmes/core/truncation/input_context_window_limiter.py", line 91, in truncate_messages_to_fit_context
Exception: The combined size of system_prompt and user_prompt (8848 tokens) exceeds the maximum context size of 6554 tokens available for input.
Error: The combined size of system_prompt and user_prompt (8848 tokens) exceeds the maximum context size of 6554 tokens available for input.

Oct 30 '25 16:10 skb888

Hi @skb888

can you try to curl your deepseek to verify the model supports tool calling?

curl http://DEEPSEEK_URL/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1:8b",
    "prompt": "You are a tool router. If user asks for weather, output ONLY this JSON: {\"tool\":\"get_weather\",\"args\":{\"city\":\"<city>\"}}. Otherwise output {\"tool\":\"none\",\"args\":{}}.\nUser: weather in New York?",
    "format": {
      "type": "object",
      "properties": {
        "tool": { "type": "string", "enum": ["get_weather","none"] },
        "args": { "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": [],
          "additionalProperties": false
        }
      },
      "required": ["tool","args"],
      "additionalProperties": false
    },
    "stream": false,
    "options": {
      "temperature": 0,
      "num_predict": 128
    }
  }'

can you also share where this command gets stuck with this additional flag? holmes ask "what pods are failing?" --model='ollama/llama3:8b' -vvv

It looks like the llama3 model has too small of a context window to run holmes. do you have access to llama3.2 with tool calling?

Nov 05 '25 10:11 Avi-Robusta

Hi, I think deepseek-r1:8b does not support for function calls. I have tested qwen3:4b and llama3.2:3b which support function calls. I have tried both OpenAI-compatible gateway(--model="openai/") and original approach (--model="ollama/"). They still face Couldn't find model ollama/xxx in litellm's model list issue.

Please check the logs: holmes ask "what pods are failing?" --model="ollama/llama3.2:3b" Loaded models: ['ollama/llama3.2:3b']
✅ Toolset kubernetes/kube-prometheus-stack
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset datadog/rds
✅ Toolset bash
✅ Toolset runbook
✅ Toolset kubernetes/logs
✅ Toolset kubernetes/core
✅ Toolset helm/core
✅ Toolset kubernetes/live-metrics
Using 43 datasources (toolsets). To refresh: use flag --refresh-toolsets
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment
variable to the correct value for your model.
Using model: ollama/llama3.2:3b (200,000 total tokens, 40,000 output tokens)
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands. User: what pods are failing?

Thinking...

Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment
variable to the correct value for your model.

holmes ask "what pods are failing?" --model="openai/llama3.2:3b" Loaded models: ['openai/llama3.2:3b']
✅ Toolset kubernetes/kube-prometheus-stack
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset datadog/rds
✅ Toolset bash
✅ Toolset runbook
✅ Toolset kubernetes/logs
✅ Toolset kubernetes/core
✅ Toolset helm/core
✅ Toolset kubernetes/live-metrics
Using 43 datasources (toolsets). To refresh: use flag --refresh-toolsets
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment
variable to the correct value for your model.
Using model: openai/llama3.2:3b (200,000 total tokens, 40,000 output tokens)
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands. User: what pods are failing?

Thinking...

Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment
variable to the correct value for your model.

In additional, I have tried llama3-8b, which faces context window size issue and attach the log here: llama3-8b.log

Nov 05 '25 18:11 skb888

Could you please share the verbose output using the -vvv flag? For example:

holmes ask "what pods are failing?" --model="openai/llama3.2:3b" --no-interactive -vvv

It might be best to email it to me at [email protected], just in case any accidental secrets are included.

Also, could you confirm the URL path you’re using in the url? For an OpenAI-compatible endpoint in llama, it should end with /v1 like this:

export OPENAI_API_BASE=http://127.0.0.1:11434/v1

Nov 06 '25 13:11 Avi-Robusta

Thanks for the quick response. I have double checked the OPENAI_API_BASE has been configured correctly.

echo $OPENAI_API_BASE
http://127.0.0.1:11434/v1

ollama list NAME ID SIZE MODIFIED
llama3.2:3b a80c4f17acd5 2.0 GB 29 hours ago

After running the shared command, I still face Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b).

Here is the detailed log: llama3.2-3b.log

Nov 06 '25 23:11 skb888

I was able to get this working for qwen3:4b. The steps I did were as follows:

Pulled the model:

ollama pull qen3:4b

Set up variables to suppress some warnings (it makes the output a bit more readable):

export OPENAI_API_BASE="http://localhost:11434/v1"
export OPENAI_API_KEY="fake-key"
export OVERRIDE_MAX_CONTENT_SIZE="200000"
export OVERRIDE_MAX_OUTPUT_TOKEN="40000"

Asked holmes a simple question:

holmes ask "what pods are failing in the default namespace?" --model="openai/qwen3:4b"

...some time later...

Nov 19 '25 03:11 peter-edb

@skb888 does @peter-edb 's fix work for you?

Nov 19 '25 10:11 Avi-Robusta

Thanks, it works for me now.

Dec 09 '25 21:12 skb888