verl [algo] feat: support router replay

What does this PR do?

This PR introduces a draft Router Replay support into Verl. Inspired by the recent research in MoE Reinforcement Learning(2510.11370, 2507.18071), this implementation supports Router Replay (R2) and Rollout Router Replay (R3). R2 allows recording routing token selection during log probability computation and replaying expert selection during policy update. R3 enables recording during model inference and replaying during RL post-training.

The initial version supports Router Replay with Megatron backend, including comprehensive support for distributed training strategies (DP, TP, EP, ETP, PP, and Re-compute).

The current implementation uses a patch-based approach. Once the upstream PR NVIDIA/Megatron-LM#2101 is merged or provides corresponding interfaces, the patch can be removed and replaced with official API integration.

Usage Tutorial

Basic Configuration

To enable Router Replay functionality, add the following configuration to your trainer config:

Method 1: Trainer Configuration

Add the following configuration to your trainer config:

router_replay:
  enabled: true
  mode: "R2"  # Options: "R2", "R3"

Method 2: Launch Script Configuration

Add the following parameter to your launch script:

# In your launch script
actor_rollout_ref.actor.router_replay.mode="R2"

R2 Mode Usage

Enable R2 mode in configuration
Record phase: During log probability computation, routing selections are automatically recorded
Replay phase: During policy update, recorded expert selections are replayed

R3 Mode Usage

Enable R3 mode in configuration
Record phase: During model inference, routing decisions are captured
Replay phase: During RL post-training, recorded routing data is used

In Progress

R2

[ ] FSDP backend

R3

[x] vLLM Rollout
[ ] Sglang Rollout

Nov 12 '25 08:11 litianjian

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

litianjian seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Nov 12 '25 08:11 CLAassistant

你好，想问下，在megatron_workers中的compute_log_prob中，R2模式下，此处 if self.enable_routing_replay and self.config.actor.router_replay.mode == "R2": RouterReplay.set_global_routing_mode(RoutingMode.RECORD) 是不是只设置了record mode，但是并没有set_replay_data呢，后续在megatron_actor的计算中，merge_router_topk_indices中获取的router_instances_list的对象的record_topk_idx全是none

Nov 21 '25 03:11 Cesilina

please fix the conflicts so we can merge this

Nov 24 '25 05:11 ISEEKYAN

你好，想问下，在megatron_workers中的compute_log_prob中，R2模式下，此处 if self.enable_routing_replay and self.config.actor.router_replay.mode == "R2": RouterReplay.set_global_routing_mode(RoutingMode.RECORD) 是不是只设置了record mode，但是并没有set_replay_data呢，后续在megatron_actor的计算中，merge_router_topk_indices中获取的router_instances_list的对象的record_topk_idx全是none

In record mode, mcore records the router selection results in raw form. These selections will be used in the next update-policy stage

Nov 26 '25 08:11 litianjian

records the router selection results in raw for

enen ,Get it! Thanks

Nov 26 '25 08:11 Cesilina

I would like to inquire about the latest progress of this project. Does R3 support the training of Megatron+vLLM? Does Megatron need to use the PR version you submitted: https://github.com/NVIDIA/Megatron-LM/pull/2101/files?

Dec 03 '25 11:12 scut-zx

I would like to inquire about the latest progress of this project. Does R3 support the training of Megatron+vLLM? Does Megatron need to use the PR version you submitted: https://github.com/NVIDIA/Megatron-LM/pull/2101/files?

it is ready for merge now. Now you don't need https://github.com/NVIDIA/Megatron-LM/pull/2101 to achieve router replay in verl. But once megatron's PR is merged, we can remove some patches from verl.

Dec 03 '25 12:12 ISEEKYAN

verl verl copied to clipboard

[algo] feat: support router replay

What does this PR do?

Usage Tutorial

Basic Configuration

Method 1: Trainer Configuration

Method 2: Launch Script Configuration

R2 Mode Usage

R3 Mode Usage

In Progress

verl
verl copied to clipboard