inference Failed to initialize NVML

System Info / 系統信息

host OS: ubuntu:22.04 Nvidia Driver Version: 550.127.08 CUDA Version: 12.4

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[x] docker / docker
[ ] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

docker images of xinf latest == 1.4.1

The command used to start Xinference / 用以启动 xinference 的命令

python3 -m sglang.launch_server --model-path /data/modelscope/hub/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B --tp 4 --trust-remote-code --port 9001 --context-length 10240 --enable-metrics --enable-torch-compile --torch-compile-max-bs 4 --mem-fraction-static 0.9 --host 0.0.0.0

Reproduction / 复现过程

root@3e0d949e3ced:/opt/inference# python3 -m sglang.launch_server --model-path /data/modelscope/hub/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B --tp 4 --trust-remote-code --port 9001 --context-length 10240 --enable-metrics --enable-torch-compile --torch-compile-max-bs 4 --mem-fraction-static 0.9 --host 0.0.0.0 /usr/local/lib/python3.10/dist-packages/torch/cuda/init.py:716: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") INFO 04-18 15:57:12 init.py:211] No platform detected, vLLM is running on UnspecifiedPlatform Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/sglang/launch_server.py", line 11, in server_args = prepare_server_args(sys.argv[1:]) File "/usr/local/lib/python3.10/dist-packages/sglang/srt/server_args.py", line 1163, in prepare_server_args server_args = ServerArgs.from_cli_args(raw_args) File "/usr/local/lib/python3.10/dist-packages/sglang/srt/server_args.py", line 1114, in from_cli_args return cls(**{attr: getattr(args, attr) for attr in attrs}) File "", line 117, in init File "/usr/local/lib/python3.10/dist-packages/sglang/srt/server_args.py", line 200, in post_init self.device = get_device() File "/usr/local/lib/python3.10/dist-packages/sglang/srt/utils.py", line 1229, in get_device raise RuntimeError("No accelerator (CUDA, XPU, HPU) is available.") RuntimeError: No accelerator (CUDA, XPU, HPU) is available.

Expected behavior / 期待表现

should have no error and run model

Apr 18 '25 22:04 pamdla

I installed xinf image via docker pull xprobe/xinference:v1.4.1.

Does this image have no cuda ?

Apr 18 '25 22:04 pamdla

This issue is stale because it has been open for 7 days with no activity.

Apr 26 '25 19:04 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

May 01 '25 19:05 github-actions[bot]