willcb/Qwen3-8B ignores enable_thinking=False

Open satoshi-koide opened this issue 1 month ago • 0 comments

I tried willcb/Qwen3-8B with enable_thinking=False but found that it still thinks.

If I understand willcb/Qwen series correctly, the only difference is the tokenizer. From what I observed, I guess that Qwen3 tokenizer adds <think></think> tag before generation if enable_thinking=False but Qwen2.5 tokenizer doesn't, which leads to wrong (or unintuitive) results (see below for details). And I'm not sure how to avoid this issue.

Details

What I did was as follows:

Running vllm server:

model=willcb/Qwen3-8B
uv run python -m vllm.entrypoints.openai.api_server \
    --model ${model} \
    --port 8000

Chatting with enable_thinking=False:

from openai import AsyncOpenAI

# vllm server running at localhost:8000
client = AsyncOpenAI(api_key='dummy', base_url="http://localhost:8000/v1")

model="willcb/Qwen3-8B"

response = await client.chat.completions.create(
    model=model,
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ],
    extra_body={"chat_template_kwargs": {"enable_thinking": False}}   # -- This works for Qwen/Qwen3-8B, but not willcb/Qwen3-8B. 
)
print(response.choices[0].message.content)

This returned:

<think>
Okay, the user greeted me with "Hello, how are you?" I need to respond in a friendly and helpful manner. Let me start by acknowledging their greeting. I should mention that I'm a large language model developed by Alibaba Cloud, which gives context about my origin. Then, I should express that I'm here to assist with any questions or tasks they might have. It's important to keep the tone positive and open-ended to encourage them to ask for help. I should also make sure the response is concise but covers the necessary points. Let me check if there's anything else I need to include. Maybe a friendly emoji to keep it approachable. Alright, that should cover it.
</think>

Hello! I'm Qwen, a large language model developed by Alibaba Cloud. I'm here to help with any questions or tasks you might have. How can I assist you today? 😊

I tried the same code with Qwen/Qwen3-8B and got:

Hello! I'm just a virtual assistant, so I don't have feelings, but I'm here and ready to help! How can I assist you today? 😊

Thank you!

Nov 06 '25 03:11 satoshi-koide

Thanks for opening this!

Sep 12 '25 01:09 FedericoBonel