VllM : NCCLerror: unhandled system error (run withNCCL DEBUG=INFo for details)
System Info / 系統信息
centos x84_64 、xinference:1.21 、cuda 12.4 、GPU: P40*4
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [x] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装
Version info / 版本信息
xinference:1.21
The command used to start Xinference / 用以启动 xinference 的命令
xinference launch --model-engine vllm --model-name deepseek-r1-distil1-gwen --model path /data/models/DeepSeek-R1-Distil1-Qwen-328 --n-gpu 4 --size-in-billions 32 --model-format pytorch --max_model_len 4096 --dtype half
Reproduction / 复现过程
Expected behavior / 期待表现
I'm not sure if this is a system level error or if it's caused by an incompatibility of vllm。
Did you use docker?
yes ,i use the docker.
yes ,i use the docker.
Added --shm-size=128g when docker run.
yes ,i use the docker.
Added
--shm-size=128gwhen docker run.
It worked for me, thanks.