Ollama deepseek-r1 or qwen3 not work
What happened?
How to solve the issue of models not found in litellm's model list?
holmes version 0.16.0
ollama list
NAME ID SIZE MODIFIED
deepseek-r1:8b 6995872bfe4c 5.2 GB 28 minutes ago
qwen3:4b 359d7dd4bcda 2.5 GB About an hour ago
llama3:8b 365c0bd3c000 4.7 GB 2 months ago
holmes ask "what pods are failing?" --model='ollama/deepseek-r1:8b'
Loaded models: ['ollama/deepseek-r1:8b']
✅ Toolset kubernetes/kube-prometheus-stack
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset datadog/rds
✅ Toolset bash
✅ Toolset runbook
✅ Toolset kubernetes/logs
✅ Toolset kubernetes/core
✅ Toolset helm/core
✅ Toolset kubernetes/live-metrics
Using 43 datasources (toolsets). To refresh: use flag --refresh-toolsets
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN
environment variable to the correct value for your model.
Using model: ollama/deepseek-r1:8b (200,000 total tokens, 40,000 output tokens)
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands.
User: what pods are failing?
Thinking...
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN
environment variable to the correct value for your model.
What did you expect to happen?
How to solve this issue
How can we reproduce it (as minimally and precisely as possible)?
How to solve this issue
Anything else we need to know?
No response
Hi @skb888, thanks for the report. We will take a look and get back to you.
Hi @skb888 can you try making a file called model_list.yaml with the contents
deepseek-r1:
api_base: PUT_API_BASE_HERE
api_key: PUT_API_KEY_HERE # if there is none just put here ""
model: ollama/deepseek-r1:8b
then run
export MODEL_LIST_FILE_LOCATION=/PATH/TO/LIST/model_list.yaml
holmes ask "what pods are failing?" --model='ollama/deepseek-r1:8b'
And share the output?
Also from my experience not all deepseek-r1 models support tool_calling/function_calling its worth verifying this model is supports that feature too
Thanks for the suggestion. I have tried created model_list.yaml and then run below. It still not works. Meanwhile, I have tried qwen3:4b model and faced the same issues.
holmes ask "what pods are failing?" --model='ollama/deepseek-r1:8b'
Loaded models: ['deepseek-r1', 'ollama/deepseek-r1:8b']
✅ Toolset kubernetes/kube-prometheus-stack
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset datadog/rds
✅ Toolset bash
✅ Toolset runbook
✅ Toolset kubernetes/logs
✅ Toolset kubernetes/core
✅ Toolset helm/core
✅ Toolset kubernetes/live-metrics
Using 43 datasources (toolsets). To refresh: use flag --refresh-toolsets
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN
environment variable to the correct value for your model.
Using model: ollama/deepseek-r1:8b (200,000 total tokens, 40,000 output tokens)
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands.
User: what pods are failing?
Thinking...
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/deepseek-r1:8b in litellm's model list (tried: ollama/deepseek-r1:8b, deepseek-r1:8b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN
environment variable to the correct value for your model.
Meanwhile, I try the llama3:8b model and also fail due to the limited context window size for this model. Any suggestion about how to limit system_prompt and user_prompt size? I do not find how to define ENV variables in the doc.
holmes ask "what pods are failing?" --model='ollama/llama3:8b'
Loaded models: ['ollama/llama3:8b']
✅ Toolset kubernetes/kube-prometheus-stack
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset datadog/rds
✅ Toolset bash
✅ Toolset runbook
✅ Toolset kubernetes/logs
✅ Toolset helm/core
✅ Toolset kubernetes/core
✅ Toolset kubernetes/live-metrics
Using 43 datasources (toolsets). To refresh: use flag --refresh-toolsets
Using model: ollama/llama3:8b (8,192 total tokens, 1,638 output tokens)
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands.
User: what pods are failing?
Thinking...
The combined size of system_prompt and user_prompt (8848 tokens) exceeds the model's context window for input.
An error occurred during interactive mode:
Traceback (most recent call last):
File "holmes/interactive.py", line 1222, in run_interactive_loop
File "sentry_sdk/tracing_utils.py", line 851, in sync_wrapper
File "holmes/core/tool_calling_llm.py", line 329, in call
File "sentry_sdk/tracing_utils.py", line 851, in sync_wrapper
File "holmes/core/truncation/input_context_window_limiter.py", line 196, in limit_input_context_window
File "holmes/core/truncation/input_context_window_limiter.py", line 91, in truncate_messages_to_fit_context
Exception: The combined size of system_prompt and user_prompt (8848 tokens) exceeds the maximum context size of 6554 tokens available for input.
Error: The combined size of system_prompt and user_prompt (8848 tokens) exceeds the maximum context size of 6554 tokens available for input.
Hi @skb888
can you try to curl your deepseek to verify the model supports tool calling?
curl http://DEEPSEEK_URL/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1:8b",
"prompt": "You are a tool router. If user asks for weather, output ONLY this JSON: {\"tool\":\"get_weather\",\"args\":{\"city\":\"<city>\"}}. Otherwise output {\"tool\":\"none\",\"args\":{}}.\nUser: weather in New York?",
"format": {
"type": "object",
"properties": {
"tool": { "type": "string", "enum": ["get_weather","none"] },
"args": { "type": "object",
"properties": { "city": { "type": "string" } },
"required": [],
"additionalProperties": false
}
},
"required": ["tool","args"],
"additionalProperties": false
},
"stream": false,
"options": {
"temperature": 0,
"num_predict": 128
}
}'
can you also share where this command gets stuck with this additional flag?
holmes ask "what pods are failing?" --model='ollama/llama3:8b' -vvv
It looks like the llama3 model has too small of a context window to run holmes. do you have access to llama3.2 with tool calling?
Hi, I think deepseek-r1:8b does not support for function calls. I have tested qwen3:4b and llama3.2:3b which support function calls. I have tried both OpenAI-compatible gateway(--model="openai/
Please check the logs:
holmes ask "what pods are failing?" --model="ollama/llama3.2:3b"
Loaded models: ['ollama/llama3.2:3b']
✅ Toolset kubernetes/kube-prometheus-stack
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset datadog/rds
✅ Toolset bash
✅ Toolset runbook
✅ Toolset kubernetes/logs
✅ Toolset kubernetes/core
✅ Toolset helm/core
✅ Toolset kubernetes/live-metrics
Using 43 datasources (toolsets). To refresh: use flag --refresh-toolsets
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment
variable to the correct value for your model.
Using model: ollama/llama3.2:3b (200,000 total tokens, 40,000 output tokens)
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands.
User: what pods are failing?
Thinking...
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model ollama/llama3.2:3b in litellm's model list (tried: ollama/llama3.2:3b, llama3.2:3b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment
variable to the correct value for your model.
holmes ask "what pods are failing?" --model="openai/llama3.2:3b"
Loaded models: ['openai/llama3.2:3b']
✅ Toolset kubernetes/kube-prometheus-stack
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset datadog/rds
✅ Toolset bash
✅ Toolset runbook
✅ Toolset kubernetes/logs
✅ Toolset kubernetes/core
✅ Toolset helm/core
✅ Toolset kubernetes/live-metrics
Using 43 datasources (toolsets). To refresh: use flag --refresh-toolsets
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment
variable to the correct value for your model.
Using model: openai/llama3.2:3b (200,000 total tokens, 40,000 output tokens)
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands.
User: what pods are failing?
Thinking...
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE
environment variable to the correct value for your model.
Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment
variable to the correct value for your model.
In additional, I have tried llama3-8b, which faces context window size issue and attach the log here: llama3-8b.log
Could you please share the verbose output using the -vvv flag?
For example:
holmes ask "what pods are failing?" --model="openai/llama3.2:3b" --no-interactive -vvv
It might be best to email it to me at [email protected], just in case any accidental secrets are included.
Also, could you confirm the URL path you’re using in the url?
For an OpenAI-compatible endpoint in llama, it should end with /v1 like this:
export OPENAI_API_BASE=http://127.0.0.1:11434/v1
Thanks for the quick response. I have double checked the OPENAI_API_BASE has been configured correctly.
echo $OPENAI_API_BASE
http://127.0.0.1:11434/v1
ollama list
NAME ID SIZE MODIFIED
llama3.2:3b a80c4f17acd5 2.0 GB 29 hours ago
After running the shared command, I still face Couldn't find model openai/llama3.2:3b in litellm's model list (tried: openai/llama3.2:3b, llama3.2:3b).
Here is the detailed log: llama3.2-3b.log
I was able to get this working for qwen3:4b. The steps I did were as follows:
Pulled the model:
ollama pull qen3:4b
Set up variables to suppress some warnings (it makes the output a bit more readable):
export OPENAI_API_BASE="http://localhost:11434/v1"
export OPENAI_API_KEY="fake-key"
export OVERRIDE_MAX_CONTENT_SIZE="200000"
export OVERRIDE_MAX_OUTPUT_TOKEN="40000"
Asked holmes a simple question:
holmes ask "what pods are failing in the default namespace?" --model="openai/qwen3:4b"
...some time later...
@skb888 does @peter-edb 's fix work for you?
Thanks, it works for me now.