claude-code-router Ollama + Qwen3-coder Error: does not support thinking

I use Ollama on Macbook Pro, and tried qwen2.5-coder:1.5b, as well as modelscope.cn/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:latest.

The content in ~/.claude-code-router/config.json:

{
  "PORT": 3456,
  "Providers": [
    {
      "name": "ollama",
      "api_base_url": "http://localhost:11434/v1/chat/completions",
      "api_key": "ollama",
      "models": ["qwen2.5-coder:1.5b", "modelscope.cn/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:latest"]
    }
  ],
  "Router": {
    "default": "ollama,modelscope.cn/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:latest",
    "background": "ollama,modelscope.cn/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:latest",
    "think": "ollama,modelscope.cn/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:latest",
    "longContext": "ollama,modelscope.cn/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:latest",
    "longContextThreshold": 60000,
    "webSearch": "ollama,modelscope.cn/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:latest"
  }
}

And when start with ccr code and type my instructions:

API Error: 400 {"error":{"message":"Error from provider(ollama,qwen2.5-coder:1.5b: 400):
    {\"error\":{\"message\":\"\\\"qwen2.5-coder:1.5b\\\" does not support
    thinking\",\"type\":\"api_error\",\"param\":null,\"code\":null}}\nError: Error from provider(ollama,qwen2.5-coder:1.5b:
    400): {\"error\":{\"message\":\"\\\"qwen2.5-coder:1.5b\\\" does not support
    thinking\",\"type\":\"api_error\",\"param\":null,\"code\":null}}\n\n    at nt
    (/opt/homebrew/lib/node_modules/@musistudio/claude-code-router/dist/cli.js:79940:11)\n    at h0
    (/opt/homebrew/lib/node_modules/@musistudio/claude-code-router/dist/cli.js:79998:11)\n    at
    process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n    at async l0 (/opt/homebrew/lib/node_modul
    es/@musistudio/claude-code-router/dist/cli.js:79965:96)","type":"api_error","code":"provider_response_error"}}

My question is:

I already knew that I can press Tab and disable thinking mode, and it works.
It is very slow on my M3 Macbook Pro, even for qwen2.5-coder:1.5B. It seems like to restart ollama for each claude code instruction?
What is the best practice? What model is best suit for M3 Macbook Pro + Claude Code?

Thanks very much.

Nov 02 '25 17:11 jiangsutx

Same issue here. it worked for me for couple of prompts. but then it got stuck midway and it started failing with same error. I tried other models also (but same result (like GLM 4.5 Air and devstral). when i disabled thinking then i get same errors but for tools (does not support tools)

Nov 05 '25 13:11 zimdin12

Hmm i got it working with non hf model. like qwen3 from ollama lib (non gguf)

Nov 05 '25 13:11 zimdin12

Okay it seems to be more complicated. i have tried different models. qwen-coder work, GLM 4.5 air works. None of the REAP versions worked. many finetunes did not work. soon i shall test UD quants and qwen3 x deepseek distill.

NB! I started using ollama lib models (via search you can find more custom ones also. I didn't know that before). I have had more luck with those.

it seems to be quite hard to find working models.

It is almost working :D I wish i would have more resources so i could test more :/ GLM 4.5 air at Q2 and Q3 were too slow for me and REAP did not work.. so cannot use that nor shall i test any higher param models.

Nov 07 '25 09:11 zimdin12

I switched to LocalAI for now but still having issues. (even tho different issues). I shall probably try out LM Studio also (even tho I do not like that it is closed source)

Dec 05 '25 11:12 zimdin12