Fanyn

Results 16 comments of Fanyn

> [@FanYaning](https://github.com/FanYaning) you can try it set HOST_IP or VLLM_HOST_IP environ. 我试过VLLM_HOST_IP,不起效果。这个是用于在不同物理服务器之间的分布式部署vllm。

这个问题能复现的: `(VllmWorker rank=0 pid=2111841) INFO 04-25 10:00:12 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_53195711'), local_subscribe_addr='ipc:///tmp/f592e4cd-9550-4429-a8d4-bde3c63a442b', remote_subscribe_addr=None, remote_addr_ipv6=False)` 在日志中找到VllmWorker rank=0 ,这个worker的进程号:2111841, 然后使用命令:netstat -naop |grep 2111841,就能发现对外监听的端口。 ![Image](https://github.com/user-attachments/assets/120be428-be92-449e-aa83-128384adf8be)

report in (https://github.com/vllm-project/vllm/security/advisories/GHSA-7gpx-v3qr-jhpf)

**Increase gpu_memory_utilization.** The vLLM pre-allocates GPU cache by using gpu_memory_utilization% of memory. By increasing this utilization, you can provide more KV cache space. **Decrease max_num_seqs or max_num_batched_tokens.** This can reduce...

参考这个试试: https://www.modelscope.cn/models/tclf90/deepseek-r1-distill-qwen-32b-gptq-int4 ![Image](https://github.com/user-attachments/assets/52c4c764-343f-42ad-8c1a-c5fa539ef1fe)

设置日志级别为:VLLM_LOGGING_LEVEL=debug 观测下加载过程。