[recipe] feat: add mopps recipe
What does this PR do?
An implementation of MoPPS (Model Predictive Prompt Selection). A Bayesian framework for online predicting prompt difficulty to accelerate RL finetuning of Large Reasoning Models
Checklist Before Starting
- [x ] Search for similar PRs. Paste at least one query link here: ...
- [√ ] Format the PR title as
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data- If this PR involves multiple modules, separate them with
,like[megatron, fsdp, doc] {type}is infeat,fix,refactor,chore,test- If this PR breaks any API (CLI arguments, config, function signature, etc.), add
[BREAKING]to the beginning of the title. - Example:
[BREAKING][fsdp, megatron] feat: dynamic batching
Test
Please refer to https://github.com/thu-rllab/MoPPS
API and Usage Example
Demonstrate how the API changes if any, and provide usage example(s) if possible.
bash recipe/mopps/scripts/countdown/cd_verl_3b_topk_noinit.sh
Design & Code Changes
Model Predictive Prompt Selection (MoPPS) is a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions. Technically, MoPPS models each prompt's success rate as a latent variable, performs streaming Bayesian inference, and employs posterior sampling in a constructed multi-armed bandit machine, enabling sample efficient and adaptive prompt selection. The main implementation is in mopps.py. MoPPS can be seamlessly integrated into your RL training pipeline:
- Before rollout: Call
sample_batch()to select prompts - After reward: Call
train()to update Bayesian posteriors
Checklist Before Submitting
[!IMPORTANT] Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
- [ √] Read the Contribute Guide.
- [ √] Apply pre-commit checks:
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always - [ x] Add / Update the documentation.
- [ x] Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
- [ x] Once your PR is ready for CI, send a message in the
ci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)
@cloud-qu Hi, thanks for your contribution. We're moving recipe to a separate project verl-project/verl-recipe, could you submit a PR to this project? https://github.com/volcengine/verl/pull/4283