Using HYPRE_Struct on multiple GPUs with GPU-aware MPI
Dear HYPRE developers,
related to my old issue #1131, I am trying to use a Struct solver on multiple GPUs while NOT using unified memory. I could not find any setup for this configuration, so I tried modifying one of the examples (ex3.c) to provide a minimum working code: https://github.com/ondrejchrenko/HYPRE_ex3
When launching the example on a GPU cluster, it works just fine when the library is configured with
./configure --with-cuda --with-gpu-arch=80
For instance, invoking
mpirun -np 1 ./ex3 -n 100 -solver 0 -v 1 1
gives the same result as
mpirun -np 4 ./ex3 -n 50 -solver 0 -v 1 1
When I configure the library using CUDA-aware MPI:
./configure --with-cuda --with-gpu-arch=80 --enable-gpu-aware-mpi
then launching with 1 MPI rank still works fine, but launching with more fails with a segmentation fault.
Do you have any advice? I have not been able to find out whether my update of ex3.c is erroneous, or if the problem is rather cluster specific (perhaps related to the configuration of software modules and their use at the compile time).
Thanks for your help, Ondrej
@ondrejchrenko Is your MPI actually GPU-aware?