guillemgt issues

Repositories
Issues
Comments

Results 2 issues of


                                            guillemgt

[trainer] fix: PPO reward model resource pool and worker creation with reward loop

### What does this PR do? This PR fixes a bug in the `main_ppo` entrypoint regarding reward model initialization when using the [reward loop](https://verl.readthedocs.io/en/latest/advance/reward_loop.html). In the current main, the reward...

[megatron] fix: Pass optimizer config betas to Megatron optimizer config

### What does this PR do? This PR fixes a bug where the value of the beta parameters for the Adam optimizer `actor_rollout_ref.(actor|critic).optim.betas` are ignored when using the Megatron backend....