Alex Filby
Alex Filby
Thinking on this a little more I think I should also update this to have a corresponding flag in argparse. Right now the only way I the user can get...
> Can we add conditional checks instead of removing? We can do a torch NCCL version check. @sanandaraj5597 We can, I removed it outright since the latest NeMo container releases...
@sanandaraj5597 Actually how would that work? The executor script is run during job launch on the local environment and **outside** the container env. We won't know what the NCCL version...
We recently had an internal team run into issues with Nemotron4 and checkpointing when those flags were set when using a Nemo container with NCCL 2.27.x+ (can link slack thread...
@malay-nagda @guyueh1 respin of https://github.com/NVIDIA-NeMo/NeMo/pull/15062 cleaned up my fork before the previous was merged.