Misc. bug: Qwen 3.0 "enable_thinking" parameter not working
Name and Version
llama-server --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes version: 5199 (ced44be3) built with MSVC 19.41.34120.0 for x64
Operating systems
windows 11
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server Qwen3-14B-Q5_K_M.gguf
Problem description & steps to reproduce
param enable_thinking: false on llama-server has not effect at all when you send on request. (despite been on Alibaba examples)
SGLang and VLLM support this by "chat_template_kwargs":
https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes
curl http://localhost:30000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Qwen/Qwen3-8B",
"messages": [
{"role": "user", "content": "Give me a short introduction to large language models."}
],
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"max_tokens": 8192,
"presence_penalty": 1.5,
"chat_template_kwargs": {"enable_thinking": false}
}'
First Bad Commit
No response
Relevant log output
As well, I would like to disable thinking in Qwen3
You can add "/think" or "/no_think" in prompt control thinking,and "enable_thinking" parameter not working
Second input with /no_think
user_input_2 = "Then, how many r's in blueberries? /no_think"
(ps: /no think is working as well )
## Third input with /think
user_input_3 = "Really? /think"
You can add "/think" or "/no_think" in prompt control thinking,and "enable_thinking" parameter not working
right. but it affected the output format
As a workaround, add the following to the beginning of your assistant message:
<think>
</think>
This is how enable_thinking is implemented in the jinja template.
I hope "chat_template_kwargs": {"enable_thinking": false} be implemented in llama.cpp too
+1 It would be awesome to merge #13196
This issue was closed because it has been inactive for 14 days since being marked as stale.