NeMo
NeMo copied to clipboard
Set segment for gb systems when nodes <= 18
Make segment selection explicit for applicable systems.
Current logic relies on Slurm defaults to do the correct thing. However on at least one internal cluster admins have it configured such that segment unset becomes 'segment=2' with negative performance impact. Better to be explicit.
Remake of #15062
@malay-nagda @guyueh1 respin of https://github.com/NVIDIA-NeMo/NeMo/pull/15062 cleaned up my fork before the previous was merged.