燃 comments

Results 67 comments of

燃

Docker image Qwen3-VL Not Running Model - No available shared memory broadcast block found

@hsiangchun4 @mkamranr `No available shared memory broadcast block found in 60 seconds.` This is just normal output—vLLM might be performing time-consuming operations such as CUDA graph compilation, etc. You can...

Vllm v0.11.0, Qwen3-VL-235B(-FP8) deployed on 8 A100s OOM

@whwangovo According to the [community guide ](https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3-VL.html#qwen3-vl-235b-a22b-instruct), you can try the following configuration: ``` vllm serve Qwen/Qwen3-VL-235B-A22B-Instruct \ --tensor-parallel-size 8 \ --max-model-len 128000 \ --async-scheduling ```

如何在 VLLM 部署后使用 API 直接传递视频进行推理？

Yes, you can pass the local video file by: `file:// + your local absolute path`, for example: ```python video_url_for_local = "file:///your/local/path/to/v_3l7quTy4c2s.mp4" # file:// + your local absolute path video_url_for_remote =...

如何在 VLLM 部署后使用 API 直接传递视频进行推理？

> 你好，可以问下你的Python版本，Pytorch版本和CUDA版本吗？我也正在部署VLLM，但遇到版本不匹配。 > > > python -m xformers.info > > WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: > > PyTorch 2.5.1 with CUDA 1201 (you have 2.5.1+cu121)...

如何在 VLLM 部署后使用 API 直接传递视频进行推理？

> > Yes, you can pass the local video file by: `file:// + your local absolute path`, for example: > > video_url_for_local = "file:///your/local/path/to/v_3l7quTy4c2s.mp4" # file:// + your local absolute...

如何在 VLLM 部署后使用 API 直接传递视频进行推理？

> > > > Yes, you can pass the local video file by: `file:// + your local absolute path`, for example: > > > > video_url_for_local = "file:///your/local/path/to/v_3l7quTy4c2s.mp4" # file://...

如何在 VLLM 部署后使用 API 直接传递视频进行推理？

> [@wulipc](https://github.com/wulipc) [@XyWzzZ](https://github.com/XyWzzZ) 您好请问是怎么解决的问题？能不能上传一个openai接口风格 json请求体的示例？文档中我看到是自己写的代码，而不是直接通过postman或者curl请求的你好，请看下 Readme 中的示例，你说的 `自己写的代码` 就是生成 openai接口风格 json请求体的过程，祝好。

如何在 VLLM 部署后使用 API 直接传递视频进行推理？

> 您好，这个输出True，但还是有上述xformers问题 > > ![Image](https://github.com/user-attachments/assets/8e1497ac-25ee-4238-ae4a-5280000de32f) @hweidream 看起来是你的显卡不支持 flash_attn， `输出 True` 这个信息不准确。 ![Image](https://github.com/user-attachments/assets/5fd66b5f-cce0-42da-ad00-b57f37100522)

如何在 VLLM 部署后使用 API 直接传递视频进行推理？

@618-github 参考文档中这个，视频需要先做 base64 编码，不能直接传路径： ```python import base64 import numpy as np from PIL import Image from io import BytesIO from openai import OpenAI from qwen_vl_utils import process_vision_info # Set OpenAI's...

Inconsistency between vLLM installation and actual usage in web demos

Thank you for your feedback. vLLM is well-supported for the Qwen2.5-VL series models. However, our Docker image is designed for various scenarios (including deployment of vLLM services), so it may...