vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Bug]: ValueError: Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.

Open WangJianQ-0118 opened this issue 2 weeks ago • 0 comments

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

vllm容器:0.7.2 启动脚本: bash run_cluster.sh
docker-hub.dahuatech.com/vllm/vllm-openai:v0.7.2
10.12.167.20
--head
/root/wangjianqiang/deepseek/DeepSeek-R1/DeepSeek-R1/
-e VLLM_HOST_IP=$(hostname -I | awk '{print $1}')/
-e "GLOO_SOCKET_IFNAME=ens121f0"/
-e "NCCL_SOCKET_IFNAME=ens121f0"/
-v /root/wangjianqiang/deepseek/DeepSeek-R1/:/root/deepseek_r1/

bash run_cluster.sh
docker-hub.dahuatech.com/vllm/vllm-openai:v0.7.2
10.12.167.20
--worker
/root/wangjianqiang/deepseek/DeepSeek-R1/DeepSeek-R1/
-e VLLM_HOST_IP=$(hostname -I | awk '{print $1}')/
-e "GLOO_SOCKET_IFNAME=ens121f0"/
-e "NCCL_SOCKET_IFNAME=ens121f0"/
-v /root/deepseek_r1/:/root/deepseek_r1/

启动命令:

root@admin:~/deepseek_r1/DeepSeek-R1# VLLM_HOST_IP=$(hostname -I | awk '{print $1}') root@admin:~/deepseek_r1/DeepSeek-R1# export VLLM_HOST_IP root@admin:~/deepseek_r1/DeepSeek-R1# NCCL_DEBUG=TRACE vllm serve /root/deepseek_r1/DeepSeek-R1 --tensor-parallel-size 16 --trust-remote-code

出现了如下的错误:

ERROR 02-09 02:31:11 engine.py:389] Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node. ERROR 02-09 02:31:11 engine.py:389] Traceback (most recent call last): ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 380, in run_mp_engine ERROR 02-09 02:31:11 engine.py:389] engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 123, in from_engine_args ERROR 02-09 02:31:11 engine.py:389] return cls(ipc_path=ipc_path, ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 75, in init ERROR 02-09 02:31:11 engine.py:389] self.engine = LLMEngine(*args, **kwargs) ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in init ERROR 02-09 02:31:11 engine.py:389] self.model_executor = executor_class(vllm_config=vllm_config, ) ERROR 02-09 02:31:11 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 262, in init ERROR 02-09 02:31:11 engine.py:389] super().init(*args, **kwargs) ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 51, in init ERROR 02-09 02:31:11 engine.py:389] self._init_executor() ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 90, in _init_executor ERROR 02-09 02:31:11 engine.py:389] self._init_workers_ray(placement_group) ERROR 02-09 02:31:11 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 227, in _init_workers_ray ERROR 02-09 02:31:11 engine.py:389] raise ValueError( ERROR 02-09 02:31:11 engine.py:389] ValueError: Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node. Process SpawnProcess-1: Traceback (most recent call last): File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine raise e File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 380, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 123, in from_engine_args return cls(ipc_path=ipc_path, ^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 75, in init self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in init self.model_executor = executor_class(vllm_config=vllm_config, ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 262, in init super().init(*args, **kwargs) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 51, in init self._init_executor() File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 90, in _init_executor self._init_workers_ray(placement_group) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 227, in _init_workers_ray raise ValueError( ValueError: Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node. *** SIGTERM received at time=1739097071 on cpu 95 *** PC: @ 0x7fa5c96777f8 (unknown) clock_nanosleep @ 0x7fa5c95d4520 (unknown) (unknown) @ ... and at least 3 more frames [2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: *** SIGTERM received at time=1739097071 on cpu 95 *** [2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: PC: @ 0x7fa5c96777f8 (unknown) clock_nanosleep [2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: @ 0x7fa5c95d4520 (unknown) (unknown) [2025-02-09 02:31:11,663 E 15439 15439] logging.cc:460: @ ... and at least 3 more frames Exception ignored in atexit callback: <function shutdown at 0x7fa44c55bd80> Traceback (most recent call last): File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1910, in shutdown time.sleep(0.5) File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1499, in sigterm_handler sys.exit(signum) SystemExit: 15 Traceback (most recent call last): File "/usr/local/bin/vllm", line 8, in sys.exit(main()) ^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/scripts.py", line 204, in main args.dispatch_function(args) File "/usr/local/lib/python3.12/dist-packages/vllm/scripts.py", line 44, in serve uvloop.run(run_server(args)) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 109, in run return __asyncio.run( ^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server async with build_async_engine_client(args) as engine_client: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client async with build_async_engine_client_from_engine_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 230, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause.

🐛 Describe the bug

1

Before submitting a new issue...

  • [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

WangJianQ-0118 avatar Feb 09 '25 10:02 WangJianQ-0118