verl
verl copied to clipboard
[RFC] Integrate VeOmni as the training engine in VERL
Feature request
This RFC proposes integrating VeOmni as a training engine backend for VERL.
The goal is to leverage VeOmni’s high-performance distributed training framework to enhance VERL’s scalability and efficiency in large-scale RLHF and post-training workflows.
Motivation
- Support FSDP2+EP, enabling VERL to run large MoE models easily on FSDP2 without relying on Megatron.
- Introduce GroupGemm ops for moe and integrat Liger-Kernel for higher performance.
- Scale any omni-models easily.
Your contribution
I am going to submitting a PR. #4072
cc @vermouth1992 @wuxibin89