verl [recipe] feat: add mopps recipe

What does this PR do?

An implementation of MoPPS (Model Predictive Prompt Selection). A Bayesian framework for online predicting prompt difficulty to accelerate RL finetuning of Large Reasoning Models

Checklist Before Starting

[x ] Search for similar PRs. Paste at least one query link here: ...
[√ ] Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Please refer to https://github.com/thu-rllab/MoPPS

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

bash recipe/mopps/scripts/countdown/cd_verl_3b_topk_noinit.sh

Design & Code Changes

Model Predictive Prompt Selection (MoPPS) is a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions. Technically, MoPPS models each prompt's success rate as a latent variable, performs streaming Bayesian inference, and employs posterior sampling in a constructed multi-armed bandit machine, enabling sample efficient and adaptive prompt selection. The main implementation is in mopps.py. MoPPS can be seamlessly integrated into your RL training pipeline:

Before rollout: Call sample_batch() to select prompts
After reward: Call train() to update Bayesian posteriors

Checklist Before Submitting

[!IMPORTANT] Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

[ √] Read the Contribute Guide.
[ √] Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
[ x] Add / Update the documentation.
[ x] Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
[ x] Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

Nov 12 '25 07:11 cloud-qu

All committers have signed the CLA.

Nov 12 '25 07:11 CLAassistant

@cloud-qu Hi, thanks for your contribution. We're moving recipe to a separate project verl-project/verl-recipe, could you submit a PR to this project? https://github.com/volcengine/verl/pull/4283

Nov 25 '25 06:11 wuxibin89