awsome-distributed-training icon indicating copy to clipboard operation
awsome-distributed-training copied to clipboard

Nsight

Open awsankur opened this issue 9 months ago • 0 comments

Issue #, if available:

Description of changes:

Added steps to:

  1. Install Nsight
  2. Added examples for NCCL Tests, Nemotron-15B, PyTorch FSDP on a Slurm cluster
  3. Added steps to setup Nsight on EKS. Added PyTorch FSDP training example with Llama2 on EKS

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

awsankur avatar May 21 '24 00:05 awsankur