vllm
vllm copied to clipboard
[Bug]: ValueError: Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
vllm容器:0.7.2
启动脚本:
bash run_cluster.sh
docker-hub.dahuatech.com/vllm/vllm-openai:v0.7.2
10.12.167.20
--head
/root/wangjianqiang/deepseek/DeepSeek-R1/DeepSeek-R1/
-e VLLM_HOST_IP=$(hostname -I | awk '{print $1}')/
-e "GLOO_SOCKET_IFNAME=ens121f0"/
-e "NCCL_SOCKET_IFNAME=ens121f0"/
-v /root/wangjianqiang/deepseek/DeepSeek-R1/:/root/deepseek_r1/
bash run_cluster.sh
docker-hub.dahuatech.com/vllm/vllm-openai:v0.7.2
10.12.167.20
--worker
/root/wangjianqiang/deepseek/DeepSeek-R1/DeepSeek-R1/
-e VLLM_HOST_IP=$(hostname -I | awk '{print $1}')/
-e "GLOO_SOCKET_IFNAME=ens121f0"/
-e "NCCL_SOCKET_IFNAME=ens121f0"/
-v /root/deepseek_r1/:/root/deepseek_r1/
启动命令:
root@admin:~/deepseek_r1/DeepSeek-R1# VLLM_HOST_IP=$(hostname -I | awk '{print $1}') root@admin:~/deepseek_r1/DeepSeek-R1# export VLLM_HOST_IP root@admin:~/deepseek_r1/DeepSeek-R1# NCCL_DEBUG=TRACE vllm serve /root/deepseek_r1/DeepSeek-R1 --tensor-parallel-size 16 --trust-remote-code
出现了如下的错误:
ERROR 02-09 02:31:11 engine.py:389] Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.
ERROR 02-09 02:31:11 engine.py:389] Traceback (most recent call last):
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 380, in run_mp_engine
ERROR 02-09 02:31:11 engine.py:389] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 123, in from_engine_args
ERROR 02-09 02:31:11 engine.py:389] return cls(ipc_path=ipc_path,
ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 75, in init
ERROR 02-09 02:31:11 engine.py:389] self.engine = LLMEngine(*args, **kwargs)
ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in init
ERROR 02-09 02:31:11 engine.py:389] self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 262, in init
ERROR 02-09 02:31:11 engine.py:389] super().init(*args, **kwargs)
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 51, in init
ERROR 02-09 02:31:11 engine.py:389] self._init_executor()
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 90, in _init_executor
ERROR 02-09 02:31:11 engine.py:389] self._init_workers_ray(placement_group)
ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 227, in _init_workers_ray
ERROR 02-09 02:31:11 engine.py:389] raise ValueError(
ERROR 02-09 02:31:11 engine.py:389] ValueError: Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
raise e
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 380, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 123, in from_engine_args
return cls(ipc_path=ipc_path,
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 75, in init
self.engine = LLMEngine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in init
self.model_executor = executor_class(vllm_config=vllm_config, )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 262, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 51, in init
self._init_executor()
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 90, in _init_executor
self._init_workers_ray(placement_group)
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 227, in _init_workers_ray
raise ValueError(
ValueError: Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.
*** SIGTERM received at time=1739097071 on cpu 95 ***
PC: @ 0x7fa5c96777f8 (unknown) clock_nanosleep
@ 0x7fa5c95d4520 (unknown) (unknown)
@ ... and at least 3 more frames
[2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: *** SIGTERM received at time=1739097071 on cpu 95 ***
[2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: PC: @ 0x7fa5c96777f8 (unknown) clock_nanosleep
[2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: @ 0x7fa5c95d4520 (unknown) (unknown)
[2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: @ ... and at least 3 more frames
Exception ignored in atexit callback: <function shutdown at 0x7fa44c55bd80>
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1910, in shutdown
time.sleep(0.5)
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1499, in sigterm_handler
sys.exit(signum)
SystemExit: 15
Traceback (most recent call last):
File "/usr/local/bin/vllm", line 8, in
🐛 Describe the bug
1
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.