RL icon indicating copy to clipboard operation
RL copied to clipboard

feat: Random dataset with specified input and output sequence length

Open guyueh1 opened this issue 1 month ago • 1 comments

What does this PR do ?

Random dataset following specified input and output sequence length

Issues

closes #1302

Usage

Use the following flags for fixed ISL/OSL eval

uv run examples/run_eval_random_dataset.py \
+data.input_len_or_input_len_generator=1000 \
generation.ignore_eos=true \
generation.vllm_cfg.max_model_len=3000

Use the following flags for fixed ISL/OSL GRPO

uv run examples/run_grpo_random_dataset.py \
+data.input_len_or_input_len_generator=1000 \
policy.generation.ignore_eos=true \
policy.generation.output_len_or_output_len_generator=2000 

Use the following flags for random ISL/OSL GRPO with mean + stdv

uv run examples/run_grpo_random_dataset.py \
grpo.val_at_start=false \
grpo.val_period=0 \
policy.max_total_sequence_length=8000 \
+data.input_len_or_input_len_generator.mean=1000 \
+data.input_len_or_input_len_generator.std=100 \
+policy.generation.output_len_or_output_len_generator.mean=2000 \
+policy.generation.output_len_or_output_len_generator.std=1000 \
policy.generation.ignore_eos=True

Before your PR is "Ready for review"

Pre checks:

  • [ ] Make sure you read and followed Contributor guidelines
  • [ ] Did you write any new necessary tests?
  • [ ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • [ ] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features

    • Added end-of-sequence handling configuration option across generation settings.
    • New example scripts for evaluation and training workflows with random math datasets.
    • Enhanced support for FP8 precision with Mixture-of-Experts models.
    • Added flexible output length configuration for generation tasks.
  • Chores

    • Updated configuration files with new generation parameters.
    • Added diagnostic metrics output in training logs.

guyueh1 avatar Oct 31 '25 20:10 guyueh1

📝 Walkthrough

Walkthrough

Adds support for synthetic output length generation and ignore_eos flag to GRPO and evaluation workflows. Introduces RandomDataset, DummyEnvironment, and new benchmarking scripts for random math datasets. Extends generation configuration, vLLM worker integration, and FP8 MoE support.

Changes

Cohort / File(s) Summary
Configuration files (YAML)
examples/configs/distillation_math.yaml, examples/configs/evals/eval.yaml, examples/configs/grpo_math_1B.yaml, examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml
Added ignore_eos: false field under generation sections in existing configs
New GRPO FP8 config
examples/configs/grpo_math_qwen30ba3b_megatron_fp8.yaml
New configuration file for GRPO with FP8 support, including vLLM tensor parallelism, FP8 quantization settings, and megatron configuration
Data layer
nemo_rl/data/__init__.py, nemo_rl/data/datasets/__init__.py, nemo_rl/data/datasets/random_dataset.py, nemo_rl/data/interfaces.py, nemo_rl/data/processors.py
RandomDataset class, random_input_len_processor, TaskDataSpec extended with input_len_or_input_len_generator field, DataConfig updated with optional input length generator
Generation configuration
nemo_rl/models/generation/interfaces.py
GenerationConfig TypedDict updated with ignore_eos: bool and output_len_or_output_len_generator fields; tightened several field types from optional to required
Generation models (FP8 MoE support)
nemo_rl/models/generation/fp8.py
Added FP8 weight processing for MoE layers via process_weights_after_loading and process_weights_after_loading_moe; removed MoE restriction in init_fp8; extended FusedMoE module traversal and weight identification
VLLm workers
nemo_rl/models/generation/vllm/vllm_worker.py, nemo_rl/models/generation/vllm/vllm_worker_async.py
Added support for ignore_eos and output_len_or_output_len_generator config options; conditioned stop_token_ids on ignore_eos flag; added output length constraints in async worker
Environment
nemo_rl/environments/dummy_environment.py
New Ray-remote DummyEnvironment class implementing EnvironmentInterface with no-op lifecycle and deterministic step/metric methods
Evaluation
nemo_rl/evals/eval.py
Added tokenizer field to MasterConfig; added per-step timing instrumentation; imported and cast generation_config to VllmConfig
Algorithm
nemo_rl/algorithms/grpo.py
Added diagnostic print for token multiplicity probability error metric in training loop
Utilities
nemo_rl/utils/sequence_length_generator.py
New utility function get_sequence_length_generator for sampling sequence lengths from normal distribution
Evaluation scripts
examples/run_eval_random_dataset.py
New evaluation orchestration script with CLI parsing, Ray initialization, dataset/dataloader setup, and evaluation execution via run_env_eval
GRPO scripts
examples/run_grpo_random_dataset.py
New GRPO training orchestration script with config loading, experiment logging, Ray initialization, data/environment setup, and sync/async training delegation

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI / Config
    participant Main as main()
    participant Setup as setup_data()
    participant Data as RandomDataset
    participant Env as DummyEnvironment
    participant Eval as run_env_eval()
    
    CLI->>Main: Config + Overrides
    Main->>Main: Load & apply Hydra config
    Main->>Setup: Tokenizer, Data Config
    Setup->>Data: input_len_or_input_len_generator
    Data->>Data: prepare_openinstructmath2_dataset()
    Setup->>Env: Ray remote init
    Env-->>Setup: DummyEnvironment actor
    Setup-->>Main: (dataset, env, tokenizer)
    Main->>Main: Initialize vllm_generation
    Main->>Eval: generation, dataloader, environment
    Eval->>Eval: Run steps with timing
    Eval-->>Main: Evaluation complete
sequenceDiagram
    participant CLI as CLI / Config
    participant Main as main()
    participant Setup as setup_data()
    participant Data as RandomDataset
    participant Tasks as Task Processors
    participant Env as DummyEnvironment
    participant GRPO as GRPO Train
    
    CLI->>Main: Config + Overrides
    Main->>Main: Load & apply Hydra config
    Main->>Main: Register OmegaConf resolver "mul"
    Main->>Setup: Tokenizer, Data Config
    Setup->>Data: input_len_or_input_len_generator (Callable or int)
    Data->>Data: RandomDataset initialized
    Setup->>Tasks: AllTaskProcessedDataset creation
    Setup->>Env: Ray remote DummyEnvironment per task
    Env-->>Setup: DummyEnvironment actors
    Setup-->>Main: (dataset, val_dataset, env_map)
    Main->>Main: Decide sync vs async GRPO
    alt Async Mode
        Main->>GRPO: async_grpo_train()
    else Sync Mode
        Main->>GRPO: grpo_train()
    end
    GRPO->>GRPO: Training loop with loss + metrics

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas requiring extra attention:

  • FP8 MoE weight processing (nemo_rl/models/generation/fp8.py) — Dense logic for handling expert-specific weight transformations, parameter wrapping, and backend-specific adjustments; verify correctness of weight scale handling and parameter alignment for both Linear and FusedMoE paths
  • New orchestration scripts (examples/run_eval_random_dataset.py, examples/run_grpo_random_dataset.py) — Significant new functionality with 100+ lines each; verify proper Ray initialization, config merging with Hydra overrides, and data/environment wiring
  • VLLm worker integration (nemo_rl/models/generation/vllm/vllm_worker.py, vllm_worker_async.py) — Verify ignore_eos and output_len_or_output_len_generator config handling across sync/async paths and SamplingParams construction
  • Generation config type tightening (nemo_rl/models/generation/interfaces.py) — Field type changes (optional to required) may have broader implications; verify compatibility with existing callers and config loading

Possibly related PRs

  • NVIDIA-NeMo/RL#977: Modifies dataset subsystem (RandomDataset, AllTaskProcessedDataset, dataset loaders) and is directly related to the data layer enhancements in this PR.
  • NVIDIA-NeMo/RL#1382: Modifies generation configuration and vLLM worker handling (stop/pad-related fields), overlapping with the config changes and worker updates here.
  • NVIDIA-NeMo/RL#1459: Modifies GenerationConfig types and ignore EOS handling in generation code, directly related to the ignore_eos and generation config updates.

Suggested labels

CI:L1, r0.4.0

Suggested reviewers

  • yuki-97
  • ashors1
  • parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning PR introduces major features (RandomDataset, eval/GRPO scripts, FP8 MoE enhancements) but contains no test results, performance metrics, or testing documentation. Complete testing pre-check items: run example scripts with various ISL/OSL configs, document results, and provide performance metrics to verify correctness and no regression.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR comprehensively implements the requirements from issue #1302: adds support for synthetic rollouts with fixed/random output sequence lengths, integrates ignore_eos functionality, and provides both evaluation and GRPO examples.
Out of Scope Changes check ✅ Passed All changes are directly scoped to supporting random datasets with configurable input/output sequence lengths. The FP8 MoE support enhancements appear to be supporting infrastructure for the GRPO configurations included in the examples.
Title check ✅ Passed The PR title directly describes the main feature: adding a random dataset with configurable input and output sequence lengths, which aligns with the core changes across configuration files, dataset classes, and generation utilities.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 09 '25 22:11 coderabbitai[bot]