verifiers
verifiers copied to clipboard
willcb/Qwen3-8B ignores enable_thinking=False
I tried willcb/Qwen3-8B with enable_thinking=False but found that it still thinks.
If I understand willcb/Qwen series correctly, the only difference is the tokenizer. From what I observed, I guess that Qwen3 tokenizer adds <think></think> tag before generation if enable_thinking=False but Qwen2.5 tokenizer doesn't, which leads to wrong (or unintuitive) results (see below for details). And I'm not sure how to avoid this issue.
Details
What I did was as follows:
- Running vllm server:
model=willcb/Qwen3-8B
uv run python -m vllm.entrypoints.openai.api_server \
--model ${model} \
--port 8000
- Chatting with
enable_thinking=False:
from openai import AsyncOpenAI
# vllm server running at localhost:8000
client = AsyncOpenAI(api_key='dummy', base_url="http://localhost:8000/v1")
model="willcb/Qwen3-8B"
response = await client.chat.completions.create(
model=model,
messages=[
{"role": "user", "content": "Hello, how are you?"}
],
extra_body={"chat_template_kwargs": {"enable_thinking": False}} # -- This works for Qwen/Qwen3-8B, but not willcb/Qwen3-8B.
)
print(response.choices[0].message.content)
This returned:
<think>
Okay, the user greeted me with "Hello, how are you?" I need to respond in a friendly and helpful manner. Let me start by acknowledging their greeting. I should mention that I'm a large language model developed by Alibaba Cloud, which gives context about my origin. Then, I should express that I'm here to assist with any questions or tasks they might have. It's important to keep the tone positive and open-ended to encourage them to ask for help. I should also make sure the response is concise but covers the necessary points. Let me check if there's anything else I need to include. Maybe a friendly emoji to keep it approachable. Alright, that should cover it.
</think>
Hello! I'm Qwen, a large language model developed by Alibaba Cloud. I'm here to help with any questions or tasks you might have. How can I assist you today? 😊
I tried the same code with Qwen/Qwen3-8B and got:
Hello! I'm just a virtual assistant, so I don't have feelings, but I'm here and ready to help! How can I assist you today? 😊
Thank you!
Thanks for opening this!