guillemgt
Results
2
issues of
guillemgt
### What does this PR do? This PR fixes a bug in the `main_ppo` entrypoint regarding reward model initialization when using the [reward loop](https://verl.readthedocs.io/en/latest/advance/reward_loop.html). In the current main, the reward...
### What does this PR do? This PR fixes a bug where the value of the beta parameters for the Adam optimizer `actor_rollout_ref.(actor|critic).optim.betas` are ignored when using the Megatron backend....