(WorkerDict pid=59715) Qwen2ForCausalLM contains 494.03M parameters
(WorkerDict pid=59715) Before building vllm rollout, memory allocated (GB): 0.9203834533691406, memory reserved (GB): 2.62890625
(WorkerDict pid=59715) INFO 03-04 17:15:40 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=8192.
(WorkerDict pid=59715) WARNING 03-04 17:15:40 config.py:380] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
(WorkerDict pid=59999) Total steps: 435, num_warmup_steps: 0 [repeated 7x across cluster]
(WorkerDict pid=59715) Critic use_remove_padding=False [repeated 3x across cluster]
(WorkerDict pid=59999) wrap_policy: functools.partial(<function or_policy at 0x1529034daca0>, policies=[functools.partial(<function transformer_auto_wrap_policy at 0x1529034dab60>, transformer_layer_cls={<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>})]) [repeated 7x across cluster]
(WorkerDict pid=59999) Actor use_remove_padding=False [repeated 7x across cluster]
(WorkerDict pid=59715) local rank 0
(WorkerDict pid=59978) NCCL version 2.20.5+cuda12.4
(WorkerDict pid=59715) before init cache memory allocated: 1.997223424GB, reserved: 2.059403264GB
(WorkerDict pid=59715) after init cache memory allocated: 33.55516672GB, reserved: 33.61734656GB
(WorkerDict pid=59999) /home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
(WorkerDict pid=59999) warnings.warn(
(WorkerDict pid=59999) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) [repeated 3x across cluster]
(WorkerDict pid=59999) kwargs: {'n': 1, 'logprobs': 1, 'max_tokens': 256, 'detokenize': False, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False}
(WorkerDict pid=59999) INFO 03-04 17:15:40 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=8192. [repeated 3x across cluster]
(WorkerDict pid=59999) WARNING 03-04 17:15:40 config.py:380] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used [repeated 3x across cluster]
(WorkerDict pid=59999) local rank 0 [repeated 3x across cluster]
(WorkerDict pid=59715) After building vllm rollout, memory allocated (GB): 30.322853088378906, memory reserved (GB): 31.30859375
(WorkerDict pid=59715) After building sharding manager, memory allocated (GB): 30.322853088378906, memory reserved (GB): 31.30859375
(WorkerDict pid=59999) NCCL version 2.20.5+cuda12.4 [repeated 2x across cluster]
(main_task pid=59139) Using LocalLogger is deprecated. The constructor API will change
(main_task pid=59139) Checkpoint tracker file does not exist: %s /home/u2024001021/verl-main/checkpoints/verl_examples/gsm8k/latest_checkpointed_iteration.txt
(main_task pid=59139) Training from scratch
(WorkerDict pid=59715) /tmp/tmplaiy5fz/main.c:6:23: fatal error: stdatomic.h: No such file or directory
(WorkerDict pid=59715) #include <stdatomic.h>
(WorkerDict pid=59715) ^
(WorkerDict pid=59715) compilation terminated.
(WorkerDict pid=59985) /home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . [repeated 3x across cluster]
(WorkerDict pid=59985) warnings.warn( [repeated 3x across cluster]
Error executing job with overrides: ['data.train_files=/home/u2024001021/datasets/gsm8k/train.parquet', 'data.val_files=/home/u2024001021/datasets/gsm8k/test.parquet', 'data.train_batch_size=256', 'data.max_prompt_length=512', 'data.max_response_length=256', 'actor_rollout_ref.model.path=/fs/archive/share/u2024001021/huggingface_models/Qwen2.5-0.5B-Instruct', 'actor_rollout_ref.actor.optim.lr=1e-6', 'actor_rollout_ref.actor.ppo_mini_batch_size=64', 'actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4', 'actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8', 'actor_rollout_ref.rollout.tensor_model_parallel_size=1', 'actor_rollout_ref.rollout.gpu_memory_utilization=0.4', 'actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4', 'critic.optim.lr=1e-5', 'critic.model.path=/fs/archive/share/u2024001021/huggingface_models/Qwen2.5-0.5B-Instruct', 'critic.ppo_micro_batch_size_per_gpu=4', 'algorithm.kl_ctrl.kl_coef=0.001', 'trainer.logger=[console]', '+trainer.val_before_train=False', 'trainer.default_hdfs_dir=null', 'trainer.n_gpus_per_node=4', 'trainer.nnodes=1', 'trainer.save_freq=10', 'trainer.test_freq=10', 'trainer.total_epochs=15']
(main_task pid=59139) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.actor_rollout_compute_log_prob() (pid=59999, ip=10.0.0.1, actor_id=e84f6786088faaaca01435c301000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x1528c97f7e10>)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/single_controller/ray/base.py", line 399, in func
(main_task pid=59139) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/single_controller/base/decorator.py", line 404, in inner
(main_task pid=59139) return func(*args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/workers/fsdp_workers.py", line 516, in compute_log_prob
(main_task pid=59139) output = self.actor.compute_log_prob(data=data)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/workers/actor/dp_actor.py", line 214, in compute_log_prob
(main_task pid=59139) _, log_probs = self._forward_micro_batch(micro_batch, temperature=temperature)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/workers/actor/dp_actor.py", line 153, in _forward_micro_batch
(main_task pid=59139) log_probs = logprobs_from_logits(logits, micro_batch['responses'])
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/utils/torch_functional.py", line 57, in logprobs_from_logits
(main_task pid=59139) output = logprobs_from_logits_flash_attn(logits, labels)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/utils/torch_functional.py", line 65, in logprobs_from_logits_flash_attn
(main_task pid=59139) output = cross_entropy_loss(logits, labels)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/flash_attn/ops/triton/cross_entropy.py", line 319, in cross_entropy_loss
(main_task pid=59139) return CrossEntropyLoss.apply(
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/torch/autograd/function.py", line 574, in apply
(main_task pid=59139) return super().apply(*args, **kwargs) # type: ignore[misc]
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/flash_attn/ops/triton/cross_entropy.py", line 196, in forward
(main_task pid=59139) cross_entropy_fwd_kernel[(n_rows,)](
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/jit.py", line 345, in
(main_task pid=59139) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 338, in run
(main_task pid=59139) return self.fn.run(*args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/jit.py", line 607, in run
(main_task pid=59139) device = driver.active.get_current_device()
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 23, in getattr
(main_task pid=59139) self._initialize_obj()
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
(main_task pid=59139) self._obj = self._init_fn()
(main_task pid=59139) ^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 9, in _create_driver
(main_task pid=59139) return actives0
(main_task pid=59139) ^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 371, in init
(main_task pid=59139) self.utils = CudaUtils() # TODO: make static
(main_task pid=59139) ^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 80, in init
(main_task pid=59139) mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
(main_task pid=59139) so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/build.py", line 48, in _build
(main_task pid=59139) ret = subprocess.check_call(cc_cmd)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/subprocess.py", line 413, in check_call
(main_task pid=59139) raise CalledProcessError(retcode, cmd)
(main_task pid=59139) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp52w3y_1k/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmp52w3y_1k/cuda_utils.cpython-311-x86_64-linux-gnu.so', '-lcuda', '-L/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/lib', '-L/usr/lib64', '-I/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/include', '-I/tmp/tmp52w3y_1k', '-I/home/u2024001021/anaconda3/envs/EasyRL/include/python3.11']' returned non-zero exit status 1.
(main_task pid=59139) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.actor_rollout_compute_log_prob() (pid=59985, ip=10.0.0.1, actor_id=0bf4661fc8e3c7c16d10786a01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x14bd75ed4e10>)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/single_controller/ray/base.py", line 399, in func
(main_task pid=59139) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/single_controller/base/decorator.py", line 404, in inner
(main_task pid=59139) return func(*args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/workers/fsdp_workers.py", line 516, in compute_log_prob
(main_task pid=59139) output = self.actor.compute_log_prob(data=data)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/workers/actor/dp_actor.py", line 214, in compute_log_prob
(main_task pid=59139) _, log_probs = self._forward_micro_batch(micro_batch, temperature=temperature)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/workers/actor/dp_actor.py", line 153, in _forward_micro_batch
(main_task pid=59139) log_probs = logprobs_from_logits(logits, micro_batch['responses'])
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/utils/torch_functional.py", line 57, in logprobs_from_logits
(main_task pid=59139) output = logprobs_from_logits_flash_attn(logits, labels)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/utils/torch_functional.py", line 65, in logprobs_from_logits_flash_attn
(main_task pid=59139) output = cross_entropy_loss(logits, labels)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/flash_attn/ops/triton/cross_entropy.py", line 319, in cross_entropy_loss
(main_task pid=59139) return CrossEntropyLoss.apply(
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/torch/autograd/function.py", line 574, in apply
(main_task pid=59139) return super().apply(*args, **kwargs) # type: ignore[misc]
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/flash_attn/ops/triton/cross_entropy.py", line 196, in forward
(main_task pid=59139) cross_entropy_fwd_kernel[(n_rows,)](
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/jit.py", line 345, in
(main_task pid=59139) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 338, in run
(main_task pid=59139) return self.fn.run(*args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/jit.py", line 607, in run
(main_task pid=59139) device = driver.active.get_current_device()
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 23, in getattr
(main_task pid=59139) self._initialize_obj()
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
(main_task pid=59139) self._obj = self._init_fn()
(main_task pid=59139) ^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 9, in _create_driver
(main_task pid=59139) return actives0
(main_task pid=59139) ^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 371, in init
(main_task pid=59139) self.utils = CudaUtils() # TODO: make static
(main_task pid=59139) ^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 80, in init
(main_task pid=59139) mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
(main_task pid=59139) so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/build.py", line 48, in _build
(main_task pid=59139) ret = subprocess.check_call(cc_cmd)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/subprocess.py", line 413, in check_call
(main_task pid=59139) raise CalledProcessError(retcode, cmd)
(main_task pid=59139) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp3lz0noso/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmp3lz0noso/cuda_utils.cpython-311-x86_64-linux-gnu.so', '-lcuda', '-L/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/lib', '-L/usr/lib64', '-I/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/include', '-I/tmp/tmp3lz0noso', '-I/home/u2024001021/anaconda3/envs/EasyRL/include/python3.11']' returned non-zero exit status 1.
(main_task pid=59139) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.actor_rollout_compute_log_prob() (pid=59715, ip=10.0.0.1, actor_id=a58fc893db60b47258ae314901000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x14d161850490>)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/single_controller/ray/base.py", line 399, in func
(main_task pid=59139) return getattr(self.worker_dict[key], name)(*args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/single_controller/base/decorator.py", line 404, in inner
(main_task pid=59139) return func(*args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/workers/fsdp_workers.py", line 516, in compute_log_prob
(main_task pid=59139) output = self.actor.compute_log_prob(data=data)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/workers/actor/dp_actor.py", line 214, in compute_log_prob
(main_task pid=59139) _, log_probs = self._forward_micro_batch(micro_batch, temperature=temperature)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/workers/actor/dp_actor.py", line 153, in _forward_micro_batch
(main_task pid=59139) log_probs = logprobs_from_logits(logits, micro_batch['responses'])
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/utils/torch_functional.py", line 57, in logprobs_from_logits
(main_task pid=59139) output = logprobs_from_logits_flash_attn(logits, labels)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/verl-main/verl/utils/torch_functional.py", line 65, in logprobs_from_logits_flash_attn
(main_task pid=59139) output = cross_entropy_loss(logits, labels)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/flash_attn/ops/triton/cross_entropy.py", line 319, in cross_entropy_loss
(main_task pid=59139) return CrossEntropyLoss.apply(
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/torch/autograd/function.py", line 574, in apply
(main_task pid=59139) return super().apply(*args, **kwargs) # type: ignore[misc]
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/flash_attn/ops/triton/cross_entropy.py", line 196, in forward
(main_task pid=59139) cross_entropy_fwd_kernel[(n_rows,)](
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/jit.py", line 345, in
(main_task pid=59139) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 338, in run
(main_task pid=59139) return self.fn.run(*args, **kwargs)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/jit.py", line 607, in run
(main_task pid=59139) device = driver.active.get_current_device()
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 23, in getattr
(main_task pid=59139) self._initialize_obj()
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
(main_task pid=59139) self._obj = self._init_fn()
(main_task pid=59139) ^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 9, in create_driver
(main_task pid=59139) return actives0
(main_task pid=59139) ^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 371, in init
(main_task pid=59139) self.utils = CudaUtils() # TODO: make static
(main_task pid=59139) ^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 80, in init
(main_task pid=59139) mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
(main_task pid=59139) so = build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/build.py", line 48, in build
(main_task pid=59139) ret = subprocess.check_call(cc_cmd)
(main_task pid=59139) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=59139) File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/subprocess.py", line 413, in check_call
(main_task pid=59139) raise CalledProcessError(retcode, cmd)
(main_task pid=59139) subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmplaiy5fz/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmplaiy5fz/cuda_utils.cpython-311-x86_64-linux-gnu.so', '-lcuda', '-L/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/lib', '-L/usr/lib64', '-I/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/include', '-I/tmp/tmplaiy5fz', '-I/home/u2024001021/anaconda3/envs/EasyRL/include/python3.11']' returned non-zero exit status 1.
Traceback (most recent call last):
File "/home/u2024001021/verl-main/verl/trainer/main_ppo.py", line 25, in main
run_ppo(config)
File "/home/u2024001021/verl-main/verl/trainer/main_ppo.py", line 33, in run_ppo
ray.get(main_task.remote(config, compute_score))
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/ray/_private/worker.py", line 2753, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/ray/_private/worker.py", line 904, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(CalledProcessError): ray::main_task() (pid=59139, ip=10.0.0.1)
File "/home/u2024001021/verl-main/verl/trainer/main_ppo.py", line 128, in main_task
trainer.fit()
File "/home/u2024001021/verl-main/verl/trainer/ppo/ray_trainer.py", line 949, in fit
old_log_prob = self.actor_rollout_wg.compute_log_prob(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/verl-main/verl/single_controller/ray/base.py", line 42, in func
output = ray.get(output)
^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(CalledProcessError): ray::WorkerDict.actor_rollout_compute_log_prob() (pid=59978, ip=10.0.0.1, actor_id=5199388e8f71ae4d3a3a754401000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x1495c1f54390>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/verl-main/verl/single_controller/ray/base.py", line 399, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/verl-main/verl/single_controller/base/decorator.py", line 404, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/verl-main/verl/workers/fsdp_workers.py", line 516, in compute_log_prob
output = self.actor.compute_log_prob(data=data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/verl-main/verl/workers/actor/dp_actor.py", line 214, in compute_log_prob
_, log_probs = self._forward_micro_batch(micro_batch, temperature=temperature)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/verl-main/verl/workers/actor/dp_actor.py", line 153, in _forward_micro_batch
log_probs = logprobs_from_logits(logits, micro_batch['responses'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/verl-main/verl/utils/torch_functional.py", line 57, in logprobs_from_logits
output = logprobs_from_logits_flash_attn(logits, labels)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/verl-main/verl/utils/torch_functional.py", line 65, in logprobs_from_logits_flash_attn
output = cross_entropy_loss(logits, labels)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/flash_attn/ops/triton/cross_entropy.py", line 319, in cross_entropy_loss
return CrossEntropyLoss.apply(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/torch/autograd/function.py", line 574, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/flash_attn/ops/triton/cross_entropy.py", line 196, in forward
cross_entropy_fwd_kernel[(n_rows,)](
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/jit.py", line 345, in
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 338, in run
return self.fn.run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/jit.py", line 607, in run
device = driver.active.get_current_device()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 23, in getattr
self._initialize_obj()
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
self._obj = self._init_fn()
^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/driver.py", line 9, in _create_driver
return actives0
^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 371, in init
self.utils = CudaUtils() # TODO: make static
^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 80, in init
mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/runtime/build.py", line 48, in _build
ret = subprocess.check_call(cc_cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpxkorz697/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpxkorz697/cuda_utils.cpython-311-x86_64-linux-gnu.so', '-lcuda', '-L/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/lib', '-L/usr/lib64', '-I/home/u2024001021/anaconda3/envs/EasyRL/lib/python3.11/site-packages/triton/backends/nvidia/include', '-I/tmp/tmpxkorz697', '-I/home/u2024001021/anaconda3/envs/EasyRL/include/python3.11']' returned non-zero exit status 1.
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
(WorkerDict pid=59985) kwargs: {'n': 1, 'logprobs': 1, 'max_tokens': 256, 'detokenize': False, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False} [repeated 3x across cluster]
(WorkerDict pid=59999) /tmp/tmp52w3y_1k/main.c:6:23: fatal error: stdatomic.h: No such file or directory [repeated 3x across cluster]
(WorkerDict pid=59999) #include <stdatomic.h> [repeated 3x across cluster]
(WorkerDict pid=59999) ^ [repeated 3x across cluster]
(WorkerDict pid=59999) compilation terminated. [repeated 3x across cluster]