inference icon indicating copy to clipboard operation
inference copied to clipboard

VllM : NCCLerror: unhandled system error (run withNCCL DEBUG=INFo for details)

Open amzfc opened this issue 10 months ago • 4 comments

System Info / 系統信息

centos x84_64 、xinference:1.21 、cuda 12.4 、GPU: P40*4

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • [x] docker / docker
  • [ ] pip install / 通过 pip install 安装
  • [ ] installation from source / 从源码安装

Version info / 版本信息

xinference:1.21

The command used to start Xinference / 用以启动 xinference 的命令

xinference launch --model-engine vllm --model-name deepseek-r1-distil1-gwen --model path /data/models/DeepSeek-R1-Distil1-Qwen-328 --n-gpu 4 --size-in-billions 32 --model-format pytorch --max_model_len 4096 --dtype half

Reproduction / 复现过程

Image

Expected behavior / 期待表现

I'm not sure if this is a system level error or if it's caused by an incompatibility of vllm。

amzfc avatar Feb 10 '25 13:02 amzfc

Did you use docker?

qinxuye avatar Feb 11 '25 02:02 qinxuye

yes ,i use the docker.

amzfc avatar Feb 11 '25 02:02 amzfc

yes ,i use the docker.

Added --shm-size=128g when docker run.

qinxuye avatar Feb 11 '25 06:02 qinxuye

yes ,i use the docker.

Added --shm-size=128g when docker run.

It worked for me, thanks.

IdleIdiot avatar Mar 13 '25 02:03 IdleIdiot