verl icon indicating copy to clipboard operation
verl copied to clipboard

RuntimeError: 0 active drivers ([]). There should only be one.

Open mertunsall opened this issue 8 months ago • 7 comments

Happened after I install deepspeed - related to https://github.com/deepspeedai/DeepSpeed/issues/7028

  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/ray/_private/worker.py", line 929, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::TaskRunner.run() (pid=818177, ip=10.220.1.179, actor_id=b403b4db31bb6ea1ca6ca26401000000, repr=<main_ppo.TaskRunner object at 0x7f605bf50040>)
  File "/home/mertunsal/verl/verl/trainer/main_ppo.py", line 99, in run
    from verl.workers.fsdp_workers import ActorRolloutRefWorker, CriticWorker
  File "/home/mertunsal/verl/verl/workers/fsdp_workers.py", line 41, in <module>
    from verl.workers.sharding_manager.fsdp_ulysses import FSDPUlyssesShardingManager
  File "/home/mertunsal/verl/verl/workers/sharding_manager/__init__.py", line 26, in <module>
    if is_megatron_core_available() and is_vllm_available():
  File "/home/mertunsal/verl/verl/utils/import_utils.py", line 26, in is_megatron_core_available
    from megatron.core import parallel_state as mpu
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/megatron/core/__init__.py", line 2, in <module>
    import megatron.core.tensor_parallel
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/megatron/core/tensor_parallel/__init__.py", line 2, in <module>
    from .cross_entropy import vocab_parallel_cross_entropy
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/megatron/core/tensor_parallel/cross_entropy.py", line 7, in <module>
    from megatron.core.parallel_state import (
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/megatron/core/parallel_state.py", line 14, in <module>
    from .utils import GlobalMemoryBuffer, is_torch_min_version
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/megatron/core/utils.py", line 1405, in <module>
    from transformer_engine.pytorch.float8_tensor import Float8Tensor
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/transformer_engine/__init__.py", line 13, in <module>
    from . import pytorch
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/transformer_engine/pytorch/__init__.py", line 81, in <module>
    from transformer_engine.pytorch.permutation import (
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/transformer_engine/pytorch/permutation.py", line 11, in <module>
    import transformer_engine.pytorch.triton.permutation as triton_permutation
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/transformer_engine/pytorch/triton/permutation.py", line 123, in <module>
    def _permute_kernel(
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 368, in decorator
    return Autotuner(fn, fn.arg_names, configs, key, reset_to_zero, restore_value, pre_hook=pre_hook,
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 130, in __init__
    self.do_bench = driver.active.get_benchmarker()
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in __getattr__
    self._initialize_obj()
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
  File "/home/mertunsal/miniconda3/envs/verl/lib/python3.10/site-packages/triton/runtime/driver.py", line 8, in _create_driver
    raise RuntimeError(f"{len(actives)} active drivers ({actives}). There should only be one.")
RuntimeError: 0 active drivers ([]). There should only be one.

mertunsall avatar Apr 03 '25 22:04 mertunsall

Do you have to use deepspeed? You can simply uninstall deepspeed as verl does not require deepspeed.

maksimstw avatar Apr 28 '25 21:04 maksimstw

不兼容,我也遇到了,必须卸载才能用

dignfei avatar May 06 '25 08:05 dignfei

File "/usr/local/lib/python3.11/site-packages/verl/trainer/main_ppo.py", line 87, in run from verl.workers.megatron_workers import ActorRolloutRefWorker, CriticWorker File "/usr/local/lib/python3.11/site-packages/verl/workers/megatron_workers.py", line 26, in from megatron.core import parallel_state as mpu File "/home/Megatron-LM-core_v0.11.0/megatron/core/init.py", line 2, in import megatron.core.tensor_parallel File "/home/Megatron-LM-core_v0.11.0/megatron/core/tensor_parallel/init.py", line 2, in from .cross_entropy import vocab_parallel_cross_entropy File "/home/Megatron-LM-core_v0.11.0/megatron/core/tensor_parallel/cross_entropy.py", line 7, in from megatron.core.parallel_state import ( File "/home/Megatron-LM-core_v0.11.0/megatron/core/parallel_state.py", line 14, in from .utils import GlobalMemoryBuffer, is_torch_min_version File "/home/Megatron-LM-core_v0.11.0/megatron/core/utils.py", line 1405, in from transformer_engine.pytorch.float8_tensor import Float8Tensor File "/usr/local/lib/python3.11/site-packages/transformer_engine/init.py", line 13, in from . import pytorch File "/usr/local/lib/python3.11/site-packages/transformer_engine/pytorch/init.py", line 81, in from transformer_engine.pytorch.permutation import ( File "/usr/local/lib/python3.11/site-packages/transformer_engine/pytorch/permutation.py", line 11, in import transformer_engine.pytorch.triton.permutation as triton_permutation File "/usr/local/lib/python3.11/site-packages/transformer_engine/pytorch/triton/permutation.py", line 112, in @triton.autotune( ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 368, in decorator return Autotuner(fn, fn.arg_names, configs, key, reset_to_zero, restore_value, pre_hook=pre_hook, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 130, in init self.do_bench = driver.active.get_benchmarker() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/triton/runtime/driver.py", line 23, in getattr self._initialize_obj() File "/usr/local/lib/python3.11/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj self._obj = self._init_fn() ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/triton/runtime/driver.py", line 8, in _create_driver raise RuntimeError(f"{len(actives)} active drivers ({actives}). There should only be one.") RuntimeError: 0 active drivers ([]). There should only be one.

使用Megatron-LM作为后端时,也遇到了这个问题,请问目前有解决方式吗?

ZMLloveMLN avatar Jun 04 '25 11:06 ZMLloveMLN

不兼容,我也遇到了,必须卸载才能用

我也是遇到了这个问题,卸载deepspeed后解决了,甚至是vllm和triton的兼容性问题也是这样解决的,属实逆天

141forever avatar Jun 11 '25 05:06 141forever

不兼容,我也遇到了,必须卸载才能用

我也是遇到了这个问题,卸载deepspeed后解决了,甚至是vllm和triton的兼容性问题也是这样解决的,属实逆天

File "/home/ma-user/anaconda3/envs/pt/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 368, in decorator
    return Autotuner(fn, fn.arg_names, configs, key, reset_to_zero, restore_value, pre_hook=pre_hook,
  File "/home/ma-user/anaconda3/envs/pt/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 130, in __init__
    self.do_bench = driver.active.get_benchmarker()
  File "/home/ma-user/anaconda3/envs/pt/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in __getattr__
    self._initialize_obj()
  File "/home/ma-user/anaconda3/envs/pt/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
  File "/home/ma-user/anaconda3/envs/pt/lib/python3.10/site-packages/triton/runtime/driver.py", line 8, in _create_driver
    raise RuntimeError(f"{len(actives)} active drivers ({actives}). There should only be one.")
RuntimeError: 0 active drivers ([]). There should only be one.

卸载了 deepspeed 还是一样的错咋办呀

Equationliu avatar Aug 08 '25 07:08 Equationliu

不兼容,我也遇到了,必须卸载才能用

我也是遇到了这个问题,卸载deepspeed后解决了,甚至是vllm和triton的兼容性问题也是这样解决的,属实逆天

File "/home/ma-user/anaconda3/envs/pt/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 368, in decorator
    return Autotuner(fn, fn.arg_names, configs, key, reset_to_zero, restore_value, pre_hook=pre_hook,
  File "/home/ma-user/anaconda3/envs/pt/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 130, in __init__
    self.do_bench = driver.active.get_benchmarker()
  File "/home/ma-user/anaconda3/envs/pt/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in __getattr__
    self._initialize_obj()
  File "/home/ma-user/anaconda3/envs/pt/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
  File "/home/ma-user/anaconda3/envs/pt/lib/python3.10/site-packages/triton/runtime/driver.py", line 8, in _create_driver
    raise RuntimeError(f"{len(actives)} active drivers ({actives}). There should only be one.")
RuntimeError: 0 active drivers ([]). There should only be one.

卸载了 deepspeed 还是一样的错咋办呀

我也遇到了这个问题,请问你怎么解决的

pip install triton==3.1.0可以解决

gxy-gxy avatar Sep 05 '25 11:09 gxy-gxy

如果环境是 torch 2.6.0+cu124 depends on triton==3.2.0,装不了triton==3.1.0咋办。

TristanShao avatar Dec 02 '25 09:12 TristanShao