Fanyn comments

Results 16 comments of


                                            Fanyn

[Usage]: how to set vLLM message queue communication handle's connect_ip to 127.0.0.1

> [@FanYaning](https://github.com/FanYaning) you can try it set HOST_IP or VLLM_HOST_IP environ. 我试过VLLM_HOST_IP，不起效果。这个是用于在不同物理服务器之间的分布式部署vllm。

[Usage]: how to set vLLM message queue communication handle's connect_ip to 127.0.0.1

这个问题能复现的： `(VllmWorker rank=0 pid=2111841) INFO 04-25 10:00:12 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_53195711'), local_subscribe_addr='ipc:///tmp/f592e4cd-9550-4429-a8d4-bde3c63a442b', remote_subscribe_addr=None, remote_addr_ipv6=False)` 在日志中找到VllmWorker rank=0 ，这个worker的进程号：2111841，然后使用命令：netstat -naop |grep 2111841，就能发现对外监听的端口。 ![Image](https://github.com/user-attachments/assets/120be428-be92-449e-aa83-128384adf8be)

[Usage]: how to set vLLM message queue communication handle's connect_ip to 127.0.0.1

report in (https://github.com/vllm-project/vllm/security/advisories/GHSA-7gpx-v3qr-jhpf)

[Usage]:两张3090使用vllm部署Qwen2.5-32B-Instruct-INT8-W8A16，显存共48g无法启动

**Increase gpu_memory_utilization.** The vLLM pre-allocates GPU cache by using gpu_memory_utilization% of memory. By increasing this utilization, you can provide more KV cache space. **Decrease max_num_seqs or max_num_batched_tokens.** This can reduce...

[Usage]: VLLM 0.7.3 with tensor parallelism outputs only exclamation marks when using multiple GPUs

参考这个试试： https://www.modelscope.cn/models/tclf90/deepseek-r1-distill-qwen-32b-gptq-int4 ![Image](https://github.com/user-attachments/assets/52c4c764-343f-42ad-8c1a-c5fa539ef1fe)

[Bug]: 为什么在部署qwen2.5-vl-32b-instruct的时候，部署过程被卡死不动了

设置日志级别为：VLLM_LOGGING_LEVEL=debug 观测下加载过程。

Fanyn

[Usage]: how to set vLLM message queue communication handle's connect_ip to 127.0.0.1

[Usage]: how to set vLLM message queue communication handle's connect_ip to 127.0.0.1

[Usage]: how to set vLLM message queue communication handle's connect_ip to 127.0.0.1

[Usage]:两张3090使用vllm部署Qwen2.5-32B-Instruct-INT8-W8A16， 显存共48g无法启动

[Usage]: VLLM 0.7.3 with tensor parallelism outputs only exclamation marks when using multiple GPUs

[Bug]: 为什么在部署qwen2.5-vl-32b-instruct的时候，部署过程被卡死不动了

[Usage]:两张3090使用vllm部署Qwen2.5-32B-Instruct-INT8-W8A16，显存共48g无法启动