RL icon indicating copy to clipboard operation
RL copied to clipboard

feat: pipeline-rl style # of inflight prompt regulation

Open youngeunkwon0405 opened this issue 1 month ago • 1 comments

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • [ ] Make sure you read and followed Contributor guidelines
  • [ ] Did you write any new necessary tests?
  • [ ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • [ ] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features

    • Added max_num_in_flight_batches_in_generation configuration parameter to control the number of in-flight prompt batches during generation, enabling fine-tuning of throughput versus off-policyness tradeoffs.
  • Documentation

    • Added guidance on using the new parameter, including recommended settings for maximizing throughput and managing training efficiency.

youngeunkwon0405 avatar Nov 10 '25 22:11 youngeunkwon0405

📝 Walkthrough

Walkthrough

This PR introduces a new configuration option max_num_in_flight_batches_in_generation to the async GRPO algorithm, enabling explicit control over the maximum number of in-flight prompt batches during generation. The change includes documentation, configuration schema updates, and implementation modifications to use this parameter instead of deriving it directly from max_trajectory_age_steps.

Changes

Cohort / File(s) Summary
Documentation Updates
docs/guides/async-grpo.md
Added new section on controlling max in-flight batches, including configuration details, valid range constraints (1 ≤ value ≤ max_trajectory_age_steps), throughput guidance, and off-policyness trade-offs.
Configuration Schema & Examples
examples/configs/grpo_math_1B.yaml
Added max_num_in_flight_batches_in_generation field under grpo.async_grpo with default value referencing max_trajectory_age_steps, including documentation comments explaining range, effect on in-flight prompts calculation, and parameter interaction.
Implementation
nemo_rl/algorithms/async_utils.py, nemo_rl/algorithms/grpo.py
Updated async GRPO configuration to use new max_num_in_flight_batches_in_generation parameter for computing in-flight prompt limits. Changed multiplier derivation in async utilities from directly using max_trajectory_age_steps to using the new explicit configuration field.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Implementation changes are straightforward parameter substitution without complex logic modifications
  • Configuration schema addition is a direct field extension with no validation logic changes
  • Changes are homogeneous and scoped to a single feature across consistent file patterns

Suggested labels

documentation, asyncRL

Suggested reviewers

  • parthchadha
  • terrykong

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning PR introduces major async GRPO feature affecting parallelism but lacks test results, performance metrics, or convergence verification. Add comprehensive testing information including test results, performance benchmarks, convergence verification, and reference any new tests added to validate the feature.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding pipeline-RL style regulation for in-flight prompts, which is the core feature across all modified files.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch youngeunk/pipeline-rl

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 10 '25 22:11 coderabbitai[bot]