燃 comments

Results 67 comments of

燃

Does Qwen3-VL deployed by vllm support Base64 encoded tensor in video_url?

We recommend passing the video URL and controlling the sampling frequency via the fps parameter. For details, please refer to: https://github.com/QwenLM/Qwen3-VL/pull/1644.

why the official demo vllm + video fails?

We have recently updated the documentation for video requests. You can take a look. @LinaZhangCoding If your issue has been resolved, please close the issue.

socre 0 for GPT4o_20241120 model on CCOCR_DocParsing_TablePhotoEng

> Hi, [@nutsintheshell](https://github.com/nutsintheshell) , > > I have conducted the evaluation but also observed a score close to 0 (0.001, actually). > > [@wulipc](https://github.com/wulipc) , would you please help check...

Performance Bottleneck: Single-threaded Image resize in qwen-vl-utils leads to low GPU utilization

@1994 Hi, thank you for your interest in Qwen! In fact, the image processing pipeline is vLLM → transformers → torchvision.transforms.resize. If you're launching an online server, it does not...

Qwen3-VL-235B-A22B-Instruct模型的图文问答部署首token在多少？

@Cristhine 感谢对 Qwen 的关注，下面文档中提供了 Benchmark 的方式，您可以参加运行下： https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3-VL.html 我在 H100 的机器上进行了测试，当 `request-rate` 设置为 10 的时候，输出如下，可以看到首包平均时间在 434.92 ms 左右；注意 `request-rate` 的值会严重影响 TTFT，如果设置过大，内部对产生排队、调度等额外的开销。 ``` ============ Serving Benchmark Result ============ Successful requests:...

promblem on vllm inference

This is normal; just wait a moment.

为什么qwen3-vl文字密集图的ocr grounding能力相比qwen2.5-vl有很大的提升？

数据只是一方面，更多信息请留意我们未来发布的技术报告。

CUDA out of memory

@yss0729 16G显存加载 7B 模型确实有点困难，也不是安装 flash attention 能解决的；如 @JJJYmmm 所说，可以尝试 7B量化版本/3b的model；或者你有多卡的话，可以尝试开 TP。

Failed to import transformers.models.qwen2_vl.modeling_qwen2_vl

Please check your installation of transformers. Good luck.

Qwen3-VL-30B-A3B-Thinking无法进行输入视频

@zjx-ERROR 请提供一下原始视频，我复现下问题：可以发送至：[email protected]