若可 comments

Results 10 comments of


                                            若可

[Bug]: Qwen3's answer was wrongly placed in `reasoning_content`

The rendering of `\n\n\n\n` in chat_template seems to prevent the ReasoningParser from detecting the `` reasoning end token, causing it to mistakenly remain in Reasoning Stage. The current DeepSeekR1ReasoningParser appears...

[Bug]: Qwen3's answer was wrongly placed in `reasoning_content`

Noticing this PR https://github.com/vllm-project/vllm/pull/17369 I'd like to point out that while this Qwen3ReasoningParser can already handle most cases, there is still one scenario it doesn't resolve. Consider this situation: When...

[Bug]: Qwen3's answer was wrongly placed in `reasoning_content`

> 直接调用阿里百炼平台的qwen3-32b，"chat_template_kwargs": {"enable_thinking": false}参数也表现不对劲，和vllm不一样，我发现直接平台调用时，该参数干脆就不生效，也可以使用上述方式解决。不过再次强调，这都是临时方案。英文能力不好，能用阿里百炼的应该都看得懂中文，这段就不写英文了。非官方，个人猜测，当"chat_template_kwargs": {"enable_thinking": false}生效时候，会完全没有``内容，然后被deepseek_r1的parser错误识别为reasoning_content。而使用/nothink模式时候，会存在一个空的类似`\n`内容，从而让content不会被错误识别为reasoning_content The logic can be found in `tokenizer_config.json` ``` {%- if add_generation_prompt %} {{- 'assistant\n' }} {%- if enable_thinking is defined and enable_thinking...

[Bug]: V1 engine fails with offline batched inference code in V0 engine

Encountered the same problem. But I'm using `vllm serve` to deploy `DeepSeek-R1-AWQ`. **Environment** - Image: cuda:12.6.0-cudnn-devel-ubuntu22.04 - GPUs: A800 x 8 - Python 3.10 - vLLM 0.7.2 - torch 2.5.1...

[Bug]: V1 engine fails with offline batched inference code in V0 engine

After reading the source code, I have the following findings. **V0 Implement** Function `get_token_bin_counts_and_mask`(`vllm.model_executor.layers.utils.py, line8`), the shape of `bin_counts` is `[num_seq, vocab_size + 1]` The `num_seqs` is `logits.shape[0]` The `tokens`...

[Bug]: V1 engine fails with offline batched inference code in V0 engine

I can apply a temporary fix by directly modifying the code to address this error. However, I am uncertain about the correctness of this approach, so this suggestion is provided...

[Bug] exceeds dimension size when tp > 1 with GPTQ

Using https://huggingface.co/ModelCloud/GLM-4.6-REAP-268B-A32B-GPTQMODEL-W4A16 with sglang v0.5.5.post1, A800 x 4 (--tp 4), got same issue. EDIT: I try to use `Qwen/Qwen3-30B-A3B-GPTQ-Int4` with A800. - deploy with single A800(no TP), running successfully. -...