Sreeram Venkat issues

Repositories
Issues
Comments

Results 4 issues of


                                            Sreeram Venkat

Multi Process GPU Training [Feature Request]

Is there a way to train on multiple GPUs across multiple processes (i.e. through torch.nn.parallel.DistributedDataParallel)?

enhancement

[Build] Build Error with MVAPICH2-GDR

**Describe the issue** A clear and concise description of what the issue is. I am trying to build AMGX with MVAPICH2-GDR on the TACC Supercomputer Lonestar6 as the MPI. Building...

build issues

cuSOLVERMp hangs on mppotrs when running on subset of nodes

I was testing the `mp_potrf_potrs` example (with fixed SPD matrix generation code) on several configurations on Perlmutter. When I request 1 node (4 GPUs), running ```bash srun -u -n 4...

cuSolverMp

Support for NCCL/RCCL via oneCCL

Is NCCL/RCCL support currently possible? If not, is there a way to manually patch it in with oneCCL?

enhancement