verl Support megatron 0.6 in veRL

Description

I am opening this PR with the hope of adding veRL support to Megatron 0.6 (although I noticed that the veRL paper seems to have already used Megatron 0.6 as the test version). From my naive perspective, I envision two possible approaches:

Communication at the parameter level.
Creating a MemoryBuffer in veRL that is fully aligned with the ParamAndGradBuffer in Megatron 0.6, and then performing broadcast and other communication operations based on this buffer.

communication-demo

In the current draft, when self._pp_rank == pp_rank, it directly uses the buffer defined in Megatron 0.6 (without even checking if use_distributed_optimizer is set), and communicates at the parameter level during parameter synchronization—this, of course, incurs some performance overhead.

At the very least, this approach seems feasible.

Testing

PPO training deepseek-llm-7b-chat on GSM8K

convergence

2D14E950-E982-45FA-C7E8-8DDBBE088F10

performance

724101DD-D581-4D27-95FD-9BEEC5F8E772

actor_gsm

PPO training deepseek-coder-6.7b-instruct on GSM8K + MATH

convergence

D7D5550C-5B66-4B3D-B391-083D8FC19B71

performance

86A488BF-3D4F-4C0B-E99C-923903FAEC78

actor_math

The curve demonstrates that this pr does not negatively impact the training convergence. The performance showes different trends: improvement in one case and degradation in the other. The reasons for these differences require further analysis.

Jan 07 '25 12:01 Chendong98

It looks really nice, we'll take some time to check how to align the two buffers to accelerate the resharding process.

Jan 08 '25 06:01 PeterSH6

Another question is whether there are any issues in MCore 0.6? If not, we may not need to patch the upstream megatron anymore.

Jan 08 '25 06:01 PeterSH6

When will this feature be merged into the main branch?

Feb 11 '25 10:02 Cppowboy

All committers have signed the CLA.

Feb 26 '25 00:02 CLAassistant

Hi @Chendong98, mcore has been upgrade to v0.11 in this PR. https://github.com/volcengine/verl/pull/392. Your contribution is acknowledged. Feel free to contact us if there is anything wrong. Thanks!

Mar 03 '25 04:03 vermouth1992

verl verl copied to clipboard

Support megatron 0.6 in veRL

Description

Testing

PPO training deepseek-llm-7b-chat on GSM8K

PPO training deepseek-coder-6.7b-instruct on GSM8K + MATH

verl
verl copied to clipboard