verl icon indicating copy to clipboard operation
verl copied to clipboard

Error when use sglang async mode

Open Yux1angJi opened this issue 1 month ago • 2 comments

System Info

----------Python Info---------- Version : 3.12.12 Compiler : GCC 11.2.0 Build : ('main', 'Oct 21 2025 20:16:04') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 25.2 Directory : /mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/pip vllm : 0.11.0 sglang : 0.5.5 ray : 2.51.1 torch : 2.8.0 ----------verl Info----------- Version : 0.7.0.dev Directory : /mnt/workspace/jiyuxiang.jyx/code/verl/verl Commit Hash : 4bf4bd32d049be648867ade8c72ee3f5c27ebfcf ----------Platform Info---------- Platform : Linux-5.10.134-010.ali5000.al8.x86_64-x86_64-with-glibc2.32 system : Linux node : notebook-835ec0eed2c1-worker-0 release : 5.10.134-010.ali5000.al8.x86_64 version : #1 SMP Fri Jun 28 19:51:27 CST 2024 ----------Environment---------- CUDA Runtime : 12.8 CUDA Compiler : Cuda compilation tools, release 12.8, V12.8.61 ----------System Info---------- CPU Memory : 1875.00 GB GPU Count : 8 GPU 1 Type : NVIDIA H20 GPU 1 Memory : 95.58 GB GPU 2 Type : NVIDIA H20 GPU 2 Memory : 95.58 GB GPU 3 Type : NVIDIA H20 GPU 3 Memory : 95.58 GB GPU 4 Type : NVIDIA H20 GPU 4 Memory : 95.58 GB GPU 5 Type : NVIDIA H20 GPU 5 Memory : 95.58 GB GPU 6 Type : NVIDIA H20 GPU 6 Memory : 95.58 GB GPU 7 Type : NVIDIA H20 GPU 7 Memory : 95.58 GB GPU 8 Type : NVIDIA H20 GPU 8 Memory : 95.58 GB

Information

  • [x] The official example scripts
  • [x] My own modified scripts

Tasks

  • [x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

verl==0.7.0.dev0

change the base model to qwen3-vl-30ba3b-instruct in examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_tool_agent_multiturn.sh

bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_tool_agent_multiturn.sh

Expected behavior

Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/trainer/main_ppo.py", line 439, in main() File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/hydra/main.py", line 94, in decorated_main _run_hydra( File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/hydra/_internal/utils.py", line 223, in run_and_report raise ex File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/hydra/_internal/utils.py", line 220, in run_and_report return func() ^^^^^^ File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( ^^^^^^^^^^ File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/hydra/_internal/hydra.py", line 132, in run _ = ret.return_value ^^^^^^^^^^^^^^^^ File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) ^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/trainer/main_ppo.py", line 42, in main run_ppo(config) File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/trainer/main_ppo.py", line 96, in run_ppo ray.get(runner.run.remote(config)) File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/ray/_private/worker.py", line 2961, in get values, debugger_breakpoint = worker.get_objects( ^^^^^^^^^^^^^^^^^^^ File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/site-packages/ray/_private/worker.py", line 1026, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=219953, ip=33.107.100.90, actor_id=2ae1f7ad6ccaa1c265f8be6103000000, repr=<main_ppo.TaskRunner object at 0x7f49da35a3c0>) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/trainer/main_ppo.py", line 338, in run trainer.init_workers() File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/trainer/ppo/ray_trainer.py", line 774, in init_workers self.async_rollout_manager = AgentLoopManager( ^^^^^^^^^^^^^^^^^ File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/experimental/agent_loop/agent_loop.py", line 683, in init self._initialize_llm_servers() File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/experimental/agent_loop/agent_loop.py", line 715, in _initialize_llm_servers self._run_all([server.init_hybrid(self.worker_group) for server in self.rollout_replicas]) File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/experimental/agent_loop/agent_loop.py", line 807, in _run_all asyncio.run(run_all()) File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/experimental/agent_loop/agent_loop.py", line 805, in run_all await asyncio.gather(*tasks) File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/workers/rollout/replica.py", line 119, in init_hybrid await self.launch_servers() File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/workers/rollout/sglang_rollout/async_sglang_server.py", line 305, in launch_servers await asyncio.gather( File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/asyncio/tasks.py", line 684, in _wrap_awaitable return await awaitable ^^^^^^^^^^^^^^^ ray.exceptions.RayTaskError(ValueError): ray::SGLangHttpServer.launch_server() (pid=230678, ip=33.107.100.90, actor_id=b7d5d91e2cd9c7a4a00ce07603000000, repr=<verl.workers.rollout.sglang_rollout.async_sglang_server.SGLangHttpServer object at 0x7fe9db29e2d0>) File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/concurrent/futures/_base.py", line 456, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/mnt/workspace/jiyuxiang.jyx/miniconda3/envs/geoagent_verl/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/ray/session_2025-11-11_15-56-34_625424_181291/runtime_resources/working_dir_files/_ray_pkg_8dc29b4cec273f20/verl/workers/rollout/sglang_rollout/async_sglang_server.py", line 167, in launch_server self.tokenizer_manager, self.template_manager, self.scheduler_info = _launch_subprocesses( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: too many values to unpack (expected 3)

Yux1angJi avatar Nov 11 '25 08:11 Yux1angJi

same

HJYao00 avatar Nov 18 '25 12:11 HJYao00

Downgrade sglang to 0.5.4 fix my error

Yux1angJi avatar Nov 19 '25 11:11 Yux1angJi