ChenyuWang1022

Results 1 comments of ChenyuWang1022

This might be because the current Verl uses async mode by default, while the current agent loop uses the /verl/experimental/reward/reward_loop. Perhaps you could set `actor_rollout_ref.rollout.mode=sync` to use the default RewardManager...