HEJIAN SANG comments

Results 7 comments of


                                            HEJIAN SANG

Agentic RL Support in GPT-OSS

The current issue to train GPT-OSS model: grad_norm of GRPO grows too fast, which prevents model to achieve reasonable good performance. * Train on gsm8k [PR](https://github.com/volcengine/verl/pull/3836) reasoning effort: medium reasoning...

Agentic RL Support in GPT-OSS

Train on retool with tool agent: [PR](https://github.com/volcengine/verl/pull/3837) grad_norm can grow as large as 1500

Agentic RL Support in GPT-OSS

* agent loop training using math-expression example: https://github.com/volcengine/verl/blob/main/recipe/langgraph_agent/example/run_gpt_oss_20b_bf16.sh

Agentic RL Support in GPT-OSS

We exclude the MOE instability by setting batch_size = mini batch size to enforce on policy.

Agentic RL Support in GPT-OSS

My only hypothesis is that there is some issue for current gpt-oss model's implementation in transformers which causes the instability of gradient. Your investigation will be really appreciated.

Agentic RL Support in GPT-OSS

Try importance sampling for gpt-oss training. The good news is taht the grad is no longer exploding but the reward and val are very looking good. My hypothesis is that:...

Support multi validation datasets

I can work on supporting this.