verl
verl copied to clipboard
[WIP] [single_controller] feat: PyTorch Monarch integration
What does this PR do?
Rough initial transfer of internal monarch integration.
Cleanup items:
- [ ] clean up of many TODOs, some are more complex
- [ ] internal infra usage like
create_mast_proc_meshneeds to be removed - [ ] PPO trainer code was copy/pasted from Ray, it should be refactored to have a parent class to removed duplicate logic
- [ ] PPO util methods like
apply_kl_penaltyshould be moved to shared util module
Test
Experimental results available from post-training the Qwen-2.5-7B on H200 GPUs using Megatron-LM.
Experimental data pending wider release.
API and Usage Example
Demonstrate how the API changes if any, and provide usage example(s) if possible.
python3 -m verl.trainer.main_ppo_monarch \
--config-path=config \
--config-name='ppo_megatron_trainer.yaml' \
...
Design & Code Changes
TODO
Checklist Before Submitting
[!IMPORTANT] Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the Contribute Guide.
- [ ] Apply pre-commit checks:
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always - [ ] Add / Update the documentation.
- [ ] Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in the
ci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.