feat: Random dataset with specified input and output sequence length
What does this PR do ?
Random dataset following specified input and output sequence length
Issues
closes #1302
Usage
Use the following flags for fixed ISL/OSL eval
uv run examples/run_eval_random_dataset.py \
+data.input_len_or_input_len_generator=1000 \
generation.ignore_eos=true \
generation.vllm_cfg.max_model_len=3000
Use the following flags for fixed ISL/OSL GRPO
uv run examples/run_grpo_random_dataset.py \
+data.input_len_or_input_len_generator=1000 \
policy.generation.ignore_eos=true \
policy.generation.output_len_or_output_len_generator=2000
Use the following flags for random ISL/OSL GRPO with mean + stdv
uv run examples/run_grpo_random_dataset.py \
grpo.val_at_start=false \
grpo.val_period=0 \
policy.max_total_sequence_length=8000 \
+data.input_len_or_input_len_generator.mean=1000 \
+data.input_len_or_input_len_generator.std=100 \
+policy.generation.output_len_or_output_len_generator.mean=2000 \
+policy.generation.output_len_or_output_len_generator.std=1000 \
policy.generation.ignore_eos=True
Before your PR is "Ready for review"
Pre checks:
- [ ] Make sure you read and followed Contributor guidelines
- [ ] Did you write any new necessary tests?
- [ ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
- [ ] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.
Additional Information
- ...
Summary by CodeRabbit
-
New Features
- Added end-of-sequence handling configuration option across generation settings.
- New example scripts for evaluation and training workflows with random math datasets.
- Enhanced support for FP8 precision with Mixture-of-Experts models.
- Added flexible output length configuration for generation tasks.
-
Chores
- Updated configuration files with new generation parameters.
- Added diagnostic metrics output in training logs.
📝 Walkthrough
Walkthrough
Adds support for synthetic output length generation and ignore_eos flag to GRPO and evaluation workflows. Introduces RandomDataset, DummyEnvironment, and new benchmarking scripts for random math datasets. Extends generation configuration, vLLM worker integration, and FP8 MoE support.
Changes
| Cohort / File(s) | Summary |
|---|---|
Configuration files (YAML) examples/configs/distillation_math.yaml, examples/configs/evals/eval.yaml, examples/configs/grpo_math_1B.yaml, examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml |
Added ignore_eos: false field under generation sections in existing configs |
New GRPO FP8 config examples/configs/grpo_math_qwen30ba3b_megatron_fp8.yaml |
New configuration file for GRPO with FP8 support, including vLLM tensor parallelism, FP8 quantization settings, and megatron configuration |
Data layer nemo_rl/data/__init__.py, nemo_rl/data/datasets/__init__.py, nemo_rl/data/datasets/random_dataset.py, nemo_rl/data/interfaces.py, nemo_rl/data/processors.py |
RandomDataset class, random_input_len_processor, TaskDataSpec extended with input_len_or_input_len_generator field, DataConfig updated with optional input length generator |
Generation configuration nemo_rl/models/generation/interfaces.py |
GenerationConfig TypedDict updated with ignore_eos: bool and output_len_or_output_len_generator fields; tightened several field types from optional to required |
Generation models (FP8 MoE support) nemo_rl/models/generation/fp8.py |
Added FP8 weight processing for MoE layers via process_weights_after_loading and process_weights_after_loading_moe; removed MoE restriction in init_fp8; extended FusedMoE module traversal and weight identification |
VLLm workers nemo_rl/models/generation/vllm/vllm_worker.py, nemo_rl/models/generation/vllm/vllm_worker_async.py |
Added support for ignore_eos and output_len_or_output_len_generator config options; conditioned stop_token_ids on ignore_eos flag; added output length constraints in async worker |
Environment nemo_rl/environments/dummy_environment.py |
New Ray-remote DummyEnvironment class implementing EnvironmentInterface with no-op lifecycle and deterministic step/metric methods |
Evaluation nemo_rl/evals/eval.py |
Added tokenizer field to MasterConfig; added per-step timing instrumentation; imported and cast generation_config to VllmConfig |
Algorithm nemo_rl/algorithms/grpo.py |
Added diagnostic print for token multiplicity probability error metric in training loop |
Utilities nemo_rl/utils/sequence_length_generator.py |
New utility function get_sequence_length_generator for sampling sequence lengths from normal distribution |
Evaluation scripts examples/run_eval_random_dataset.py |
New evaluation orchestration script with CLI parsing, Ray initialization, dataset/dataloader setup, and evaluation execution via run_env_eval |
GRPO scripts examples/run_grpo_random_dataset.py |
New GRPO training orchestration script with config loading, experiment logging, Ray initialization, data/environment setup, and sync/async training delegation |
Sequence Diagram(s)
sequenceDiagram
participant CLI as CLI / Config
participant Main as main()
participant Setup as setup_data()
participant Data as RandomDataset
participant Env as DummyEnvironment
participant Eval as run_env_eval()
CLI->>Main: Config + Overrides
Main->>Main: Load & apply Hydra config
Main->>Setup: Tokenizer, Data Config
Setup->>Data: input_len_or_input_len_generator
Data->>Data: prepare_openinstructmath2_dataset()
Setup->>Env: Ray remote init
Env-->>Setup: DummyEnvironment actor
Setup-->>Main: (dataset, env, tokenizer)
Main->>Main: Initialize vllm_generation
Main->>Eval: generation, dataloader, environment
Eval->>Eval: Run steps with timing
Eval-->>Main: Evaluation complete
sequenceDiagram
participant CLI as CLI / Config
participant Main as main()
participant Setup as setup_data()
participant Data as RandomDataset
participant Tasks as Task Processors
participant Env as DummyEnvironment
participant GRPO as GRPO Train
CLI->>Main: Config + Overrides
Main->>Main: Load & apply Hydra config
Main->>Main: Register OmegaConf resolver "mul"
Main->>Setup: Tokenizer, Data Config
Setup->>Data: input_len_or_input_len_generator (Callable or int)
Data->>Data: RandomDataset initialized
Setup->>Tasks: AllTaskProcessedDataset creation
Setup->>Env: Ray remote DummyEnvironment per task
Env-->>Setup: DummyEnvironment actors
Setup-->>Main: (dataset, val_dataset, env_map)
Main->>Main: Decide sync vs async GRPO
alt Async Mode
Main->>GRPO: async_grpo_train()
else Sync Mode
Main->>GRPO: grpo_train()
end
GRPO->>GRPO: Training loop with loss + metrics
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~75 minutes
Areas requiring extra attention:
- FP8 MoE weight processing (
nemo_rl/models/generation/fp8.py) — Dense logic for handling expert-specific weight transformations, parameter wrapping, and backend-specific adjustments; verify correctness of weight scale handling and parameter alignment for both Linear and FusedMoE paths - New orchestration scripts (
examples/run_eval_random_dataset.py,examples/run_grpo_random_dataset.py) — Significant new functionality with 100+ lines each; verify proper Ray initialization, config merging with Hydra overrides, and data/environment wiring - VLLm worker integration (
nemo_rl/models/generation/vllm/vllm_worker.py,vllm_worker_async.py) — Verifyignore_eosandoutput_len_or_output_len_generatorconfig handling across sync/async paths and SamplingParams construction - Generation config type tightening (
nemo_rl/models/generation/interfaces.py) — Field type changes (optional to required) may have broader implications; verify compatibility with existing callers and config loading
Possibly related PRs
- NVIDIA-NeMo/RL#977: Modifies dataset subsystem (RandomDataset, AllTaskProcessedDataset, dataset loaders) and is directly related to the data layer enhancements in this PR.
- NVIDIA-NeMo/RL#1382: Modifies generation configuration and vLLM worker handling (stop/pad-related fields), overlapping with the config changes and worker updates here.
- NVIDIA-NeMo/RL#1459: Modifies GenerationConfig types and ignore EOS handling in generation code, directly related to the ignore_eos and generation config updates.
Suggested labels
CI:L1, r0.4.0
Suggested reviewers
- yuki-97
- ashors1
- parthchadha
Pre-merge checks and finishing touches
❌ Failed checks (2 warnings)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
| Test Results For Major Changes | ⚠️ Warning | PR introduces major features (RandomDataset, eval/GRPO scripts, FP8 MoE enhancements) but contains no test results, performance metrics, or testing documentation. | Complete testing pre-check items: run example scripts with various ISL/OSL configs, document results, and provide performance metrics to verify correctness and no regression. |
✅ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Linked Issues check | ✅ Passed | The PR comprehensively implements the requirements from issue #1302: adds support for synthetic rollouts with fixed/random output sequence lengths, integrates ignore_eos functionality, and provides both evaluation and GRPO examples. |
| Out of Scope Changes check | ✅ Passed | All changes are directly scoped to supporting random datasets with configurable input/output sequence lengths. The FP8 MoE support enhancements appear to be supporting infrastructure for the GRPO configurations included in the examples. |
| Title check | ✅ Passed | The PR title directly describes the main feature: adding a random dataset with configurable input and output sequence lengths, which aligns with the core changes across configuration files, dataset classes, and generation utilities. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.