mpich icon indicating copy to clipboard operation
mpich copied to clipboard

OpenMPI vs MPICH performance issue

Open SnaKyEyeS opened this issue 1 year ago • 1 comments

Hello,

While investigating MPI performances on Lucia with @thomasgillis, we found that MPICH was a fair amount slower than OpenMPI. The testcase here is a simple ping-pong (device to device) between two Nvidia GPUs on distinct nodes, with increasing message size. Also shown for further comparison are the results of fi_bw (from fabtests).

MPICH version is 4.2.3 compiled with libfabric v1.22.0, and we set the following environment variables for MPICH's run:

export FI_PROVIDER="verbs;ofi_rxm"
export FI_HMEM_CUDA_USE_GDRCOPY=1
export FI_OFI_RXM_BUFFER_SIZE=256
export FI_OFI_RXM_SAR_LIMIT=256
export MPIR_CVAR_CH4_OFI_ENABLE_HMEM=1

SnaKyEyeS avatar Oct 23 '24 12:10 SnaKyEyeS

Thanks for reporting. I will look into the performance issue.

yfguo avatar Oct 24 '24 04:10 yfguo