RL
RL copied to clipboard
Add more detailed sequence length distribution logging
Is your feature request related to a problem? Please describe. Right now nemo-rl logs the mean and max generated tokens per sample every step, but those two metrics cannot fully capture the distribution. For further performance study, we want to improve observability of the sequence length distribution.
Describe the solution you'd like my idea is we perform a full plotting of the sequence distribution every N steps.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.