lmingze comments

Results 3 comments of


                                            lmingze

[Bug]: vllm部署qwen2.5_vl_72b之后，你们有出现，刚部署好之后调用一切正常3-5秒一条，然后使用一段时间，就越来越慢了的情况吗60s一条

> 有遇到过，这是一种偶发性的降速。突然一条请求的速度会很慢，通常是 16t/s，降到了 4t/s。这个时候查看显卡使用率的话会达到 100%，但是等待这个请求生成完返回后，又恢复正常了。我是刚开始部署部署之后A800占用50%，然后使用了一度时间，显存就占用到93%多了，调用还一直卡着，很奇怪

[Bug]: vllm部署qwen2.5_vl_72b之后，你们有出现，刚部署好之后调用一切正常3-5秒一条，然后使用一段时间，就越来越慢了的情况吗60s一条

> > ### Your current environment > > 这是部署启动命令： CUDA_VISIBLE_DEVICES=2,34,5 vllm serve /mnt/cfs/ljc/ckpt/Qwen/Qwen2___5-VL-72B-Instruct --tensor-parallel-size 4 --gpu-memory-utilization 0.8 --port 20772 --limit-mm-per-prompt image=5 > qwen2.5_vl_72B_instruct_20772_new.log 2>&1 & > > ### 🐛 Describe...

[Bug]: 为什么在部署qwen2.5-vl-32b-instruct的时候，部署过程被卡死不动了

> I noticed that you aren't using GPUs with consecutive indices. From my experience, this may cause NCCL hang with tensor parallelism. i adjust the order of GPUs index ,...