mpich icon indicating copy to clipboard operation
mpich copied to clipboard

GPU Allreduce performance on Aurora

Open rithwiktom opened this issue 7 months ago • 4 comments

I observe a performance issue with GPU Allreduce on Aurora

~> mpiexec -n 6 -ppn 6 --cpu-bind list:2:15:28:54:67:80 /home/rtom_intel/sow/osu_xpmem_git/osu-benchmarks/mpi/collective/osu_allreduce -m 2048:2048 -i 10000 -x 200 -d ze D D

# OSU MPI-ZE Allreduce Latency Test v5.6.2
# Size       Avg Latency(us)
2048                   17.74

~> module load mpich/opt/develop-git.6037a7a 

The following have been reloaded with a version change:
  1) mpich/opt/4.2.3-intel => mpich/opt/develop-git.6037a7a

~> mpiexec -n 6 -ppn 6 --cpu-bind list:2:15:28:54:67:80 /home/rtom_intel/sow/osu_xpmem_git/osu-benchmarks/mpi/collective/osu_allreduce -m 2048:2048 -i 10000 -x 200 -d ze D D

# OSU MPI-ZE Allreduce Latency Test v5.6.2
# Size       Avg Latency(us)
2048                   61.02

Following environment variables were used:

export EnableImplicitScaling=0
export NEOReadDebugKeys=1
export ZE_ENABLE_PCI_ID_DEVICE_ORDER=1
export MPIR_CVAR_GPU_USE_IMMEDIATE_COMMAND_LIST=1
module load mpich-config/collective-tuning/1024

rithwiktom avatar Jun 06 '25 19:06 rithwiktom

@rithwiktom Could you confirm that both are using the same collective algorithm?

hzhou avatar Jun 12 '25 19:06 hzhou

I'll check it out

rithwiktom avatar Jun 18 '25 19:06 rithwiktom

I suspect it is threshold change for MPIR_CVAR_GPU_FAST_COPY_MAX_SIZE. @rithwiktom Try set it to 1024.

hzhou avatar Aug 19 '25 15:08 hzhou

FWIW, with https://github.com/pmodels/mpich/pull/7541, I got average latency 14.06 us.

hzhou avatar Aug 19 '25 17:08 hzhou