guyueh1
guyueh1
**Is your feature request related to a problem? Please describe.** Currently async_save is [disabled](https://github.com/NVIDIA-NeMo/RL/blame/8762f575c0d11aeb8a073a64e49cf433eb77c94a/nemo_rl/models/policy/megatron_policy_worker.py#L654) in mcore path checkpoint, serialization takes a long time with training paused; should test async_save and...
When number of nodes is >=32 and calling `policy.train`, on some ranks it takes a long time to perform the initial synchronization (AllReduce nccl kernel) but on some other ranks...
Tracking items to improve performance for long max-seqlen workloads, including training and generation performance at long context.
**Is your feature request related to a problem? Please describe.** Right now nemo-rl logs the mean and max generated tokens per sample every step, but those two metrics cannot fully...
**Is your feature request related to a problem? Please describe.** Right now in nemo-rl GRPO, the generation workers will return the token_ids to header node on cpu, and header node...
Issue to track low-precision GRPO recipe testing and perf optimization
**Is your feature request related to a problem? Please describe.** Release code for FP8 rollout + FP8 training (blockwise scaling) * Convergence test and downstream evaluation compared w/ BF16 for...
# What does this PR do ? Random dataset following specified input and output sequence length # Issues closes #1302 # Usage Use the following flags for fixed ISL/OSL eval...
**Is your feature request related to a problem? Please describe.** Change the performance test script to use DAPO algo and Math17k dataset.
Tracking v0.5 items for MoE performance, example model is deepseek v3.