supermancmk
supermancmk
@younesbelkada @lvwerra @lewtun @kashif @vwxyzjn @edbeeching @qgallouedec @Michellehbn Hi, I use PPOV2 trainer for PPO and run it according to the command given in **examples/scripts/ppo/ppo.py**, but set offload_optimizer_device to CPU...
I pulled the latest version of verl's code and when running the official gsm8k with tool, multi turn async rollout sglang example without any modifications, the model crashes for training...
Could qwen be supported to use Megatron for SFT training? Thanks