verl Error raised when running spin algo: assert self._is_actor or (not self._is_actor and self._is_rollout) AssertionError: Checkpoint loading is only supported for Actor or standalone Rollout Workers, but got False and False

epoch=10

set -e

set -x

VISIBLE_DEVICES=0,1,2,3,4,5,6,7

export HYDRA_FULL_ERROR=1

HYDRA_FULL_ERROR=1

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

python3 -m recipe.spin.main_spin data.train_files=/pathdir/data/bench_v3/verl/train.parquet data.val_files=/pathdir/data/bench_v3/verl/test.parquet data.train_batch_size=1024 data.max_prompt_length=1024 data.max_response_length=60 actor_rollout_ref.model.path=/pathdir/model/rewrite_sft_v2_third_11318 actor_rollout_ref.actor.optim.lr=1e-6 actor_rollout_ref.actor.ppo_mini_batch_size=64 actor_rollout_ref.actor.ppo_micro_batch_size=8 actor_rollout_ref.rollout.tensor_model_parallel_size=1 actor_rollout_ref.rollout.gpu_memory_utilization=0.6 actor_rollout_ref.ref.log_prob_micro_batch_size=64 algorithm.kl_ctrl.kl_coef=0.001 'trainer.logger=[console]' trainer.val_before_train=True trainer.default_hdfs_dir=null trainer.n_gpus_per_node=8 trainer.nnodes=1 trainer.save_freq=-1 trainer.test_freq=1 +trainer.log_freq=1 trainer.ref_update_freq=1 +trainer.remove_previous_ckpt_in_save=True trainer.project_name=verl_spin_entity_rewrite trainer.experiment_name=spin_v0.5 custom_reward_function.path=reward_spin.py custom_reward_function.name=compute_score actor_rollout_ref.model.enable_gradient_checkpointing=True actor_rollout_ref.actor.fsdp_config.param_offload=False actor_rollout_ref.actor.fsdp_config.optimizer_offload=False actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 actor_rollout_ref.rollout.tensor_model_parallel_size=4 actor_rollout_ref.rollout.name=vllm actor_rollout_ref.rollout.gpu_memory_utilization=0.7 critic.optim.lr=1e-5 critic.model.use_remove_padding=True critic.model.path=/pathdir/model/rewrite_sft_v2_third_11318 critic.model.enable_gradient_checkpointing=False critic.ppo_micro_batch_size_per_gpu=8 critic.model.fsdp_config.param_offload=False critic.model.fsdp_config.optimizer_offload=False trainer.total_epochs=10 /opt/conda/lib/python3.10/site-packages/hydra/_internal/config_loader_impl.py:216: UserWarning: provider=hydra.searchpath in main, path=verl/trainer/config is not available. warnings.warn( 2025-07-04 15:13:44,296 WARNING utils.py:606 -- Ray currently does not support initializing Ray with fractional cpus. Your num_cpus will be truncated from 99.84 to 99. 2025-07-04 15:13:44,516 INFO worker.py:1879 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (TaskRunner pid=355822) DeprecationWarning: ray.state.available_resources_per_node is a private attribute and access will be removed in a future Ray version. (TaskRunner pid=355822) WARNING:2025-07-04 15:14:08,231:Waiting for register center actor u6Tbbv_register_center to be ready. Elapsed time: 0 seconds out of 300 seconds. (WorkerDict pid=362012) You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda'). (WorkerDict pid=362008) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) (WorkerDict pid=361041) You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda'). [repeated 7x across cluster] (WorkerDict pid=362005) 2025-07-04 15:15:20,107 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend (WorkerDict pid=361041) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) [repeated 7x across cluster] (WorkerDict pid=362010) /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . (WorkerDict pid=362010) warnings.warn( (WorkerDict pid=362009) 2025-07-04 15:15:20,630 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend [repeated 7x across cluster] Online DPO Training Progress: 0%| | 1/820 [00:00<?, ?it/s] (WorkerDict pid=362012) /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . [repeated 7x across cluster] (WorkerDict pid=362012) warnings.warn( [repeated 7x across cluster] (WorkerDict pid=362006) INFO:2025-07-04 15:17:57,360:[Rank 2] Saved model to /tmp/actor_state_mid/model_world_size_8_rank_2.pt (WorkerDict pid=362006) INFO:2025-07-04 15:17:57,361:[Rank 2] Saved optim to /tmp/actor_state_mid/optim_world_size_8_rank_2.pt (WorkerDict pid=362006) INFO:2025-07-04 15:17:57,362:[Rank 2] Saved extra_state to /tmp/actor_state_mid/extra_state_world_size_8_rank_2.pt (TaskRunner pid=355822) Traceback (most recent call last): (TaskRunner pid=355822) File "/ossfs/workspace/zz_run/codes/verl-main/recipe/spin/spin_trainer.py", line 1100, in fit_dpo (TaskRunner pid=355822) self.ref_policy_wg.load_checkpoint(actor_state_path, None, True) # Adapt load logic (TaskRunner pid=355822) File "/ossfs/workspace/zz_run/codes/verl-main/verl/single_controller/ray/base.py", line 51, in call (TaskRunner pid=355822) output = ray.get(output) (TaskRunner pid=355822) File "/opt/conda/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper (TaskRunner pid=355822) return fn(*args, **kwargs) (TaskRunner pid=355822) File "/opt/conda/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper (TaskRunner pid=355822) return func(*args, **kwargs) (TaskRunner pid=355822) File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get (TaskRunner pid=355822) values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) (TaskRunner pid=355822) File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects (TaskRunner pid=355822) raise value.as_instanceof_cause() (TaskRunner pid=355822) ray.exceptions.RayTaskError(AssertionError): ray::WorkerDict.ref_load_checkpoint() (pid=362005, ip=33.134.6.114, actor_id=2fdadbfc20f00c086174e03601000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f2287fba3b0>) (TaskRunner pid=355822) File "/ossfs/workspace/zz_run/codes/verl-main/verl/single_controller/ray/base.py", line 710, in func (TaskRunner pid=355822) return getattr(self.worker_dict[key], name)(*args, **kwargs) (TaskRunner pid=355822) File "/ossfs/workspace/zz_run/codes/verl-main/verl/single_controller/base/decorator.py", line 549, in inner (TaskRunner pid=355822) return func(*args, **kwargs) (TaskRunner pid=355822) File "/ossfs/workspace/zz_run/codes/verl-main/verl/workers/fsdp_workers.py", line 892, in load_checkpoint (TaskRunner pid=355822) assert self._is_actor or (not self._is_actor and self._is_rollout), ( (TaskRunner pid=355822) AssertionError: Checkpoint loading is only supported for Actor or standalone Rollout Workers, but got False and False (TaskRunner pid=355822) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::WorkerDict.ref_load_checkpoint() (pid=362012, ip=33.134.6.114, actor_id=ccd1d1e5f781fb3b594f5e9501000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7fb39f571480>) (TaskRunner pid=355822) File "/ossfs/workspace/zz_run/codes/verl-main/verl/single_controller/ray/base.py", line 710, in func (TaskRunner pid=355822) return getattr(self.worker_dict[key], name)(*args, **kwargs) (TaskRunner pid=355822) File "/ossfs/workspace/zz_run/codes/verl-main/verl/single_controller/base/decorator.py", line 549, in inner (TaskRunner pid=355822) return func(*args, **kwargs) (TaskRunner pid=355822) File "/ossfs/workspace/zz_run/codes/verl-main/verl/workers/fsdp_workers.py", line 892, in load_checkpoint (TaskRunner pid=355822) assert self._is_actor or (not self._is_actor and self._is_rollout), ( (TaskRunner pid=355822) AssertionError: Checkpoint loading is only supported for Actor or standalone Rollout Workers, but got False and False