Results 67 comments of

readme 中示例输出符合预期吗?

> 我的示例输出:(缺少开头的``) @qibinlin 开头的 `` 在 chat tamplate 里面,属于 prompt, 不会出现在 response 里。

位置编码见这里: https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py#L1509

> > 位置编码见这里: https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py#L1509 > > [@wulipc](https://github.com/wulipc) > > 引入了动态 FPS (每秒帧数)训练, 这个动态帧体现在哪里? @cqray1990 FPS 是视频的采帧频率,训练和测试过程中,可以动态设置。https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py#L142

We haven't thoroughly tested VLLM_USE_FLASHINFER_MOE_FP16 internally, so it hasn't been set as the default configuration. For more optimization techniques related to Expert Parallelism (EP), please refer to the community documentation:...

The vLLM is a good choice. Please refer to the [Deploy](https://github.com/QwenLM/Qwen2.5-VL?tab=readme-ov-file#deployment) section in the README document.

Hi,这个日志没有体现核心错误,该脚本目前只在 Qwen3 235A22 测试过,你可以发完整的日志我看下;如果模型太大无法运行,你可以留意下我们最近要发布的小尺寸模型,感谢对千问的支持。 另外,qwen vl 2.5 的 web demo 你可以按照旧的文档进行配置: https://github.com/QwenLM/Qwen3-VL/blob/d2240f11656bfe404b9ba56db4e51cd09f522ff1/web_demo_mm.py

@XYZ-916 @nneowvoincee 目前暂时不支持限制 thinking 的长度,也不支持 thinking_budget 参数。有个思路是使用 logit processor 如果 thinking 长度达到预设置可以直接输出 来实现,但目前 vLLM 还不支持 pre-request level 的 LPs,所以在 vLLM 暂时无法实现。

@elliotgao @Jun-Howie We identified that the issue is caused by the `get_input_positions` function. In pure text mode, it produces `position_ids` with shape `[n]`, while the correct shape should be `[3,...

@elliotgao @Jun-Howie the fix has been merged into the main branch of vLLM. Try it out and see if it fixes your problem. Thanks again for your feedback! https://github.com/vllm-project/vllm/pull/18526