awsome-distributed-training copied to clipboard
Issue #, if available:
Description of changes:
Added steps to:
- Install Nsight
- Added examples for NCCL Tests, Nemotron-15B, PyTorch FSDP on a Slurm cluster
- Added steps to setup Nsight on EKS. Added PyTorch FSDP training example with Llama2 on EKS
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.