hypre icon indicating copy to clipboard operation
hypre copied to clipboard

MPI Cuda-Aware enabled in Hypre ?

Open pledac opened this issue 3 months ago • 2 comments

Hi,

My C++ code is built with PETSc 3.23 and Hypre 2.33, on top of OpenMPI with Cuda support enabled. It is running fine on multi-GPU, with MPI comunications between devices in the C++ part, in the PETSc part, but not in the Hypre part. When profiling with Nsight system, we saw D2H and H2D copy near the MPI calls:

Image

Blue: PCApply/spmv_fixup_kernel_v2 Red: D2H copy (8000 bytes) Grey: MPI_irecv Grey: MPI_isend Grey: MPI_Waitall// Green: H2D copy (8000 bytes) Blue: PCApply/csmv_v2_partition_kernel

Hypre is configured during PETSc build with --enable-gpu-aware-mpi, and in the Hypre_config.h file, there is HYPRE_USING_GPU_AWARE_MPI defined.

Cuda support is well enabled in used MPI:

ompi_info --parsable --all 2>/dev/null | grep mpi_built_with_cuda_support:value:true
mca:mpi:base:param:mpi_built_with_cuda_support:value:true

What could be the reason, that D2D copy are done in my code, PETSc, but not into Hypre ?

The Hypre preconditioner (Boomeramg) is used through PETSc with KSP solver:

-ksp_type cg -pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_strong_threshold 0.7 -mat_type mpiaijcusparse -vec_type mpicuda

Thanks for any tips,

Pierre

pledac avatar Sep 01 '25 13:09 pledac

Hi Pierre, are you able to share a reproducer (MWE)? I assume you had -use_gpu_aware_mpi 1, but just checking

victorapm avatar Sep 01 '25 13:09 victorapm

Hi Victor,

Thanks for your help, sure I can share a PETSc reproducer. The Nsight profile above is from it:

You would need to build PETSC 3.23.2 with Hypre 3.23 and run ex64 from (share/petsc/examples/src/ksp/ksp/tutorials)

mpirun -np 2 ./ex46 -da_grid_x 1000 -da_grid_y 1000 -ksp_monitor -ksp_type cg -pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_strong_threshold 0.7 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -use_gpu_aware_mpi 1

You would compare with PETSc algebric multigrid preconditioner where no copy D2H and H2D happens (except small 8 bytes for residual) during KSPSolve: mpirun -np 2 ./ex46 -da_grid_x 1000 -da_grid_y 1000 -ksp_monitor -ksp_type cg -pc_type gamg -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -use_gpu_aware_mpi 1

AFAIK, -use_gpu_aware_mpi 1 is the default in PETSc if MPI is GPU-Aware. I added it though.

Hope it will helps,

Pierre

pledac avatar Sep 01 '25 17:09 pledac