feat: pipeline-rl style # of inflight prompt regulation
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
- You can potentially add a usage example below
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
- [ ] Make sure you read and followed Contributor guidelines
- [ ] Did you write any new necessary tests?
- [ ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
- [ ] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.
Additional Information
- ...
Summary by CodeRabbit
-
New Features
- Added
max_num_in_flight_batches_in_generationconfiguration parameter to control the number of in-flight prompt batches during generation, enabling fine-tuning of throughput versus off-policyness tradeoffs.
- Added
-
Documentation
- Added guidance on using the new parameter, including recommended settings for maximizing throughput and managing training efficiency.
📝 Walkthrough
Walkthrough
This PR introduces a new configuration option max_num_in_flight_batches_in_generation to the async GRPO algorithm, enabling explicit control over the maximum number of in-flight prompt batches during generation. The change includes documentation, configuration schema updates, and implementation modifications to use this parameter instead of deriving it directly from max_trajectory_age_steps.
Changes
| Cohort / File(s) | Summary |
|---|---|
Documentation Updates docs/guides/async-grpo.md |
Added new section on controlling max in-flight batches, including configuration details, valid range constraints (1 ≤ value ≤ max_trajectory_age_steps), throughput guidance, and off-policyness trade-offs. |
Configuration Schema & Examples examples/configs/grpo_math_1B.yaml |
Added max_num_in_flight_batches_in_generation field under grpo.async_grpo with default value referencing max_trajectory_age_steps, including documentation comments explaining range, effect on in-flight prompts calculation, and parameter interaction. |
Implementation nemo_rl/algorithms/async_utils.py, nemo_rl/algorithms/grpo.py |
Updated async GRPO configuration to use new max_num_in_flight_batches_in_generation parameter for computing in-flight prompt limits. Changed multiplier derivation in async utilities from directly using max_trajectory_age_steps to using the new explicit configuration field. |
Estimated code review effort
🎯 2 (Simple) | ⏱️ ~10 minutes
- Implementation changes are straightforward parameter substitution without complex logic modifications
- Configuration schema addition is a direct field extension with no validation logic changes
- Changes are homogeneous and scoped to a single feature across consistent file patterns
Suggested labels
documentation, asyncRL
Suggested reviewers
- parthchadha
- terrykong
Pre-merge checks and finishing touches
❌ Failed checks (2 warnings)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
| Test Results For Major Changes | ⚠️ Warning | PR introduces major async GRPO feature affecting parallelism but lacks test results, performance metrics, or convergence verification. | Add comprehensive testing information including test results, performance benchmarks, convergence verification, and reference any new tests added to validate the feature. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title accurately describes the main change: adding pipeline-RL style regulation for in-flight prompts, which is the core feature across all modified files. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
youngeunk/pipeline-rl
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.