llama.cpp Misc. bug: Qwen 3.0 "enable_thinking" parameter not working

Name and Version

llama-server --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes version: 5199 (ced44be3) built with MSVC 19.41.34120.0 for x64

Operating systems

windows 11

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server Qwen3-14B-Q5_K_M.gguf

Problem description & steps to reproduce

param enable_thinking: false on llama-server has not effect at all when you send on request. (despite been on Alibaba examples)

SGLang and VLLM support this by "chat_template_kwargs":

https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes

curl http://localhost:30000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'

First Bad Commit

No response

Relevant log output

Apr 29 '25 01:04 celsowm

As well, I would like to disable thinking in Qwen3

Apr 29 '25 05:04 gnusupport

You can add "/think" or "/no_think" in prompt control thinking，and "enable_thinking" parameter not working

Apr 29 '25 07:04 zhouxihong1

Second input with /no_think

user_input_2 = "Then, how many r's in blueberries? /no_think"

(ps: /no think is working as well )

## Third input with /think
user_input_3 = "Really? /think"

Apr 29 '25 07:04 WayneJin88888

You can add "/think" or "/no_think" in prompt control thinking，and "enable_thinking" parameter not working

right. but it affected the output format

Apr 29 '25 08:04 woshitoutouge

As a workaround, add the following to the beginning of your assistant message:

<think>

</think>

This is how enable_thinking is implemented in the jinja template.

Apr 29 '25 09:04 snichols

I hope "chat_template_kwargs": {"enable_thinking": false} be implemented in llama.cpp too

Apr 29 '25 14:04 celsowm

+1 It would be awesome to merge #13196

May 08 '25 16:05 createthis

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jun 22 '25 01:06 github-actions[bot]