verl icon indicating copy to clipboard operation
verl copied to clipboard

[recipe] feat: add mopps recipe

Open cloud-qu opened this issue 1 month ago • 1 comments

What does this PR do?

An implementation of MoPPS (Model Predictive Prompt Selection). A Bayesian framework for online predicting prompt difficulty to accelerate RL finetuning of Large Reasoning Models

Checklist Before Starting

  • [x ] Search for similar PRs. Paste at least one query link here: ...
  • [√ ] Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Please refer to https://github.com/thu-rllab/MoPPS

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

bash recipe/mopps/scripts/countdown/cd_verl_3b_topk_noinit.sh

Design & Code Changes

Model Predictive Prompt Selection (MoPPS) is a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions. Technically, MoPPS models each prompt's success rate as a latent variable, performs streaming Bayesian inference, and employs posterior sampling in a constructed multi-armed bandit machine, enabling sample efficient and adaptive prompt selection. The main implementation is in mopps.py. MoPPS can be seamlessly integrated into your RL training pipeline:

  • Before rollout: Call sample_batch() to select prompts
  • After reward: Call train() to update Bayesian posteriors

Checklist Before Submitting

[!IMPORTANT] Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

cloud-qu avatar Nov 12 '25 07:11 cloud-qu

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Nov 12 '25 07:11 CLAassistant

@cloud-qu Hi, thanks for your contribution. We're moving recipe to a separate project verl-project/verl-recipe, could you submit a PR to this project? https://github.com/volcengine/verl/pull/4283

wuxibin89 avatar Nov 25 '25 06:11 wuxibin89