lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

support qwen3 /think & /no_think & enable_thinking parameter

Open BUJIDAOVS opened this issue 6 months ago • 1 comments

  1. Add support for parsing the "/think" and "/no_think" commands, with "/no_think" mode as the default.
  2. When the model does not be told to think, add "\n\n\n\n" to the prompt to skip thinking.
  3. Support for the qwen3 model to set the thinking mode through the "enable_thinking" parameter or the "/think" command in chat_completions_v1 api.

Related issues: [https://github.com/InternLM/lmdeploy/issues/3511]

BUJIDAOVS avatar May 15 '25 14:05 BUJIDAOVS

Hi, @BUJIDAOVS Thank you very much for the contribution to LMDeploy. There is some linting error. Please kindly fix it as follows:

pip install pre-commit==3.8.0  # make sure the python version < 3.11
cd lmdeploy # the root directory of lmdeploy repo
pre-commit install
pre-commit run --all-files

lvhan028 avatar May 16 '25 08:05 lvhan028

Thanks again for your dedicated contributions.

As I was testing the functionality, how are we expected to use this feature? Currently, I use the following commands after launching the API server.

  • Disable thinking
curl http://localhost:5656/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "/nvme4/huggingface_hub/hub/models--Qwen--Qwen3-8B/snapshots/a80f5e57cce20e57b65145f4213844dec1a80834",
"messages": [
{"role": "user", "content": "Give me a short introduction to large language models."}
],
"temperature": 0.7,
"top_p": 0.8,
"max_tokens": 1024,
"enable_thinking": false
}'
  • Enable thinking
curl http://localhost:5656/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "models--Qwen--Qwen3-8B/snapshots/a80f5e57cce20e57b65145f4213844dec1a80834",
"messages": [
{"role": "user", "content": "Give me a short introduction to large language models."}
],
"temperature": 0.7,
"top_p": 0.8,
"max_tokens": 1024,
"enable_thinking": true
}'

With "enable_thinking": false, the output contents still have the thinking process. Is there anything wrong with my test commands?

image

CUHKSZzxy avatar May 20 '25 07:05 CUHKSZzxy

Thanks again for your dedicated contributions.

As I was testing the functionality, how are we expected to use this feature? Currently, I use the following commands after launching the API server.

  • Disable thinking
curl http://localhost:5656/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "/nvme4/huggingface_hub/hub/models--Qwen--Qwen3-8B/snapshots/a80f5e57cce20e57b65145f4213844dec1a80834",
"messages": [
{"role": "user", "content": "Give me a short introduction to large language models."}
],
"temperature": 0.7,
"top_p": 0.8,
"max_tokens": 1024,
"enable_thinking": false
}'
  • Enable thinking
curl http://localhost:5656/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "models--Qwen--Qwen3-8B/snapshots/a80f5e57cce20e57b65145f4213844dec1a80834",
"messages": [
{"role": "user", "content": "Give me a short introduction to large language models."}
],
"temperature": 0.7,
"top_p": 0.8,
"max_tokens": 1024,
"enable_thinking": true
}'

With "enable_thinking": false, the output contents still have the thinking process. Is there anything wrong with my test commands?

image

Specify "--chat-template qwen3" when starting the service. In this template, the default model is in No-Think Mode. Add "enable_thinking": true switches to Think mode.

BUJIDAOVS avatar May 20 '25 10:05 BUJIDAOVS