feat: Random dataset with specified input and output sequence length

Open guyueh1 opened this issue 1 month ago • 1 comments

What does this PR do ?

Random dataset following specified input and output sequence length

Issues

closes #1302

Usage

Use the following flags for fixed ISL/OSL eval

uv run examples/run_eval_random_dataset.py \
+data.input_len_or_input_len_generator=1000 \
generation.ignore_eos=true \
generation.vllm_cfg.max_model_len=3000

Use the following flags for fixed ISL/OSL GRPO

uv run examples/run_grpo_random_dataset.py \
+data.input_len_or_input_len_generator=1000 \
policy.generation.ignore_eos=true \
policy.generation.output_len_or_output_len_generator=2000

Use the following flags for random ISL/OSL GRPO with mean + stdv

uv run examples/run_grpo_random_dataset.py \
grpo.val_at_start=false \
grpo.val_period=0 \
policy.max_total_sequence_length=8000 \
+data.input_len_or_input_len_generator.mean=1000 \
+data.input_len_or_input_len_generator.std=100 \
+policy.generation.output_len_or_output_len_generator.mean=2000 \
+policy.generation.output_len_or_output_len_generator.std=1000 \
policy.generation.ignore_eos=True

Before your PR is "Ready for review"

Pre checks:

[ ] Make sure you read and followed Contributor guidelines
[ ] Did you write any new necessary tests?
[ ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
[ ] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

Summary by CodeRabbit

New Features
- Added end-of-sequence handling configuration option across generation settings.
- New example scripts for evaluation and training workflows with random math datasets.
- Enhanced support for FP8 precision with Mixture-of-Experts models.
- Added flexible output length configuration for generation tasks.
Chores
- Updated configuration files with new generation parameters.
- Added diagnostic metrics output in training logs.

Oct 31 '25 20:10 guyueh1

📝 Walkthrough

Walkthrough

Adds support for synthetic output length generation and ignore_eos flag to GRPO and evaluation workflows. Introduces RandomDataset, DummyEnvironment, and new benchmarking scripts for random math datasets. Extends generation configuration, vLLM worker integration, and FP8 MoE support.

Changes

Cohort / File(s)	Summary
Configuration files (YAML) `examples/configs/distillation_math.yaml`, `examples/configs/evals/eval.yaml`, `examples/configs/grpo_math_1B.yaml`, `examples/configs/vlm_grpo_3B.yaml`, `examples/configs/vlm_grpo_3B_megatron.yaml`	Added `ignore_eos: false` field under generation sections in existing configs
New GRPO FP8 config `examples/configs/grpo_math_qwen30ba3b_megatron_fp8.yaml`	New configuration file for GRPO with FP8 support, including vLLM tensor parallelism, FP8 quantization settings, and megatron configuration
Data layer `nemo_rl/data/__init__.py`, `nemo_rl/data/datasets/__init__.py`, `nemo_rl/data/datasets/random_dataset.py`, `nemo_rl/data/interfaces.py`, `nemo_rl/data/processors.py`	RandomDataset class, random_input_len_processor, TaskDataSpec extended with `input_len_or_input_len_generator` field, DataConfig updated with optional input length generator
Generation configuration `nemo_rl/models/generation/interfaces.py`	GenerationConfig TypedDict updated with `ignore_eos: bool` and `output_len_or_output_len_generator` fields; tightened several field types from optional to required
Generation models (FP8 MoE support) `nemo_rl/models/generation/fp8.py`	Added FP8 weight processing for MoE layers via `process_weights_after_loading` and `process_weights_after_loading_moe`; removed MoE restriction in `init_fp8`; extended FusedMoE module traversal and weight identification
VLLm workers `nemo_rl/models/generation/vllm/vllm_worker.py`, `nemo_rl/models/generation/vllm/vllm_worker_async.py`	Added support for `ignore_eos` and `output_len_or_output_len_generator` config options; conditioned `stop_token_ids` on `ignore_eos` flag; added output length constraints in async worker
Environment `nemo_rl/environments/dummy_environment.py`	New Ray-remote DummyEnvironment class implementing EnvironmentInterface with no-op lifecycle and deterministic step/metric methods
Evaluation `nemo_rl/evals/eval.py`	Added `tokenizer` field to MasterConfig; added per-step timing instrumentation; imported and cast generation_config to VllmConfig
Algorithm `nemo_rl/algorithms/grpo.py`	Added diagnostic print for token multiplicity probability error metric in training loop
Utilities `nemo_rl/utils/sequence_length_generator.py`	New utility function `get_sequence_length_generator` for sampling sequence lengths from normal distribution
Evaluation scripts `examples/run_eval_random_dataset.py`	New evaluation orchestration script with CLI parsing, Ray initialization, dataset/dataloader setup, and evaluation execution via `run_env_eval`
GRPO scripts `examples/run_grpo_random_dataset.py`	New GRPO training orchestration script with config loading, experiment logging, Ray initialization, data/environment setup, and sync/async training delegation

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI / Config
    participant Main as main()
    participant Setup as setup_data()
    participant Data as RandomDataset
    participant Env as DummyEnvironment
    participant Eval as run_env_eval()
    
    CLI->>Main: Config + Overrides
    Main->>Main: Load & apply Hydra config
    Main->>Setup: Tokenizer, Data Config
    Setup->>Data: input_len_or_input_len_generator
    Data->>Data: prepare_openinstructmath2_dataset()
    Setup->>Env: Ray remote init
    Env-->>Setup: DummyEnvironment actor
    Setup-->>Main: (dataset, env, tokenizer)
    Main->>Main: Initialize vllm_generation
    Main->>Eval: generation, dataloader, environment
    Eval->>Eval: Run steps with timing
    Eval-->>Main: Evaluation complete

sequenceDiagram
    participant CLI as CLI / Config
    participant Main as main()
    participant Setup as setup_data()
    participant Data as RandomDataset
    participant Tasks as Task Processors
    participant Env as DummyEnvironment
    participant GRPO as GRPO Train
    
    CLI->>Main: Config + Overrides
    Main->>Main: Load & apply Hydra config
    Main->>Main: Register OmegaConf resolver "mul"
    Main->>Setup: Tokenizer, Data Config
    Setup->>Data: input_len_or_input_len_generator (Callable or int)
    Data->>Data: RandomDataset initialized
    Setup->>Tasks: AllTaskProcessedDataset creation
    Setup->>Env: Ray remote DummyEnvironment per task
    Env-->>Setup: DummyEnvironment actors
    Setup-->>Main: (dataset, val_dataset, env_map)
    Main->>Main: Decide sync vs async GRPO
    alt Async Mode
        Main->>GRPO: async_grpo_train()
    else Sync Mode
        Main->>GRPO: grpo_train()
    end
    GRPO->>GRPO: Training loop with loss + metrics

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas requiring extra attention:

FP8 MoE weight processing (nemo_rl/models/generation/fp8.py) — Dense logic for handling expert-specific weight transformations, parameter wrapping, and backend-specific adjustments; verify correctness of weight scale handling and parameter alignment for both Linear and FusedMoE paths
New orchestration scripts (examples/run_eval_random_dataset.py, examples/run_grpo_random_dataset.py) — Significant new functionality with 100+ lines each; verify proper Ray initialization, config merging with Hydra overrides, and data/environment wiring
VLLm worker integration (nemo_rl/models/generation/vllm/vllm_worker.py, vllm_worker_async.py) — Verify ignore_eos and output_len_or_output_len_generator config handling across sync/async paths and SamplingParams construction
Generation config type tightening (nemo_rl/models/generation/interfaces.py) — Field type changes (optional to required) may have broader implications; verify compatibility with existing callers and config loading

Possibly related PRs

NVIDIA-NeMo/RL#977: Modifies dataset subsystem (RandomDataset, AllTaskProcessedDataset, dataset loaders) and is directly related to the data layer enhancements in this PR.
NVIDIA-NeMo/RL#1382: Modifies generation configuration and vLLM worker handling (stop/pad-related fields), overlapping with the config changes and worker updates here.
NVIDIA-NeMo/RL#1459: Modifies GenerationConfig types and ignore EOS handling in generation code, directly related to the ignore_eos and generation config updates.

Suggested labels

CI:L1, r0.4.0

Suggested reviewers

yuki-97
ashors1
parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	PR introduces major features (RandomDataset, eval/GRPO scripts, FP8 MoE enhancements) but contains no test results, performance metrics, or testing documentation.	Complete testing pre-check items: run example scripts with various ISL/OSL configs, document results, and provide performance metrics to verify correctness and no regression.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	The PR comprehensively implements the requirements from issue #1302: adds support for synthetic rollouts with fixed/random output sequence lengths, integrates ignore_eos functionality, and provides both evaluation and GRPO examples.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to supporting random datasets with configurable input/output sequence lengths. The FP8 MoE support enhancements appear to be supporting infrastructure for the GRPO configurations included in the examples.
Title check	✅ Passed	The PR title directly describes the main feature: adding a random dataset with configurable input and output sequence lengths, which aligns with the core changes across configuration files, dataset classes, and generation utilities.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Nov 09 '25 22:11 coderabbitai[bot]