feat: pipeline-rl style # of inflight prompt regulation

Open youngeunkwon0405 opened this issue 1 month ago • 1 comments

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

[ ] Make sure you read and followed Contributor guidelines
[ ] Did you write any new necessary tests?
[ ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
[ ] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

Summary by CodeRabbit

New Features
- Added max_num_in_flight_batches_in_generation configuration parameter to control the number of in-flight prompt batches during generation, enabling fine-tuning of throughput versus off-policyness tradeoffs.
Documentation
- Added guidance on using the new parameter, including recommended settings for maximizing throughput and managing training efficiency.

Nov 10 '25 22:11 youngeunkwon0405

📝 Walkthrough

Walkthrough

This PR introduces a new configuration option max_num_in_flight_batches_in_generation to the async GRPO algorithm, enabling explicit control over the maximum number of in-flight prompt batches during generation. The change includes documentation, configuration schema updates, and implementation modifications to use this parameter instead of deriving it directly from max_trajectory_age_steps.

Changes

Cohort / File(s)	Summary
Documentation Updates `docs/guides/async-grpo.md`	Added new section on controlling max in-flight batches, including configuration details, valid range constraints (1 ≤ value ≤ max_trajectory_age_steps), throughput guidance, and off-policyness trade-offs.
Configuration Schema & Examples `examples/configs/grpo_math_1B.yaml`	Added `max_num_in_flight_batches_in_generation` field under `grpo.async_grpo` with default value referencing `max_trajectory_age_steps`, including documentation comments explaining range, effect on in-flight prompts calculation, and parameter interaction.
Implementation `nemo_rl/algorithms/async_utils.py`, `nemo_rl/algorithms/grpo.py`	Updated async GRPO configuration to use new `max_num_in_flight_batches_in_generation` parameter for computing in-flight prompt limits. Changed multiplier derivation in async utilities from directly using `max_trajectory_age_steps` to using the new explicit configuration field.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Implementation changes are straightforward parameter substitution without complex logic modifications
Configuration schema addition is a direct field extension with no validation logic changes
Changes are homogeneous and scoped to a single feature across consistent file patterns

Suggested labels

documentation, asyncRL

Suggested reviewers

parthchadha
terrykong

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	PR introduces major async GRPO feature affecting parallelism but lacks test results, performance metrics, or convergence verification.	Add comprehensive testing information including test results, performance benchmarks, convergence verification, and reference any new tests added to validate the feature.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding pipeline-RL style regulation for in-flight prompts, which is the core feature across all modified files.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment
[ ] Commit unit tests in branch youngeunk/pipeline-rl

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Nov 10 '25 22:11 coderabbitai[bot]