Sreeram Venkat
Sreeram Venkat
Is there a way to train on multiple GPUs across multiple processes (i.e. through torch.nn.parallel.DistributedDataParallel)?
**Describe the issue** A clear and concise description of what the issue is. I am trying to build AMGX with MVAPICH2-GDR on the TACC Supercomputer Lonestar6 as the MPI. Building...
I was testing the `mp_potrf_potrs` example (with fixed SPD matrix generation code) on several configurations on Perlmutter. When I request 1 node (4 GPUs), running ```bash srun -u -n 4...
Is NCCL/RCCL support currently possible? If not, is there a way to manually patch it in with oneCCL?