mpich
mpich copied to clipboard
GPU Allreduce performance on Aurora
I observe a performance issue with GPU Allreduce on Aurora
~> mpiexec -n 6 -ppn 6 --cpu-bind list:2:15:28:54:67:80 /home/rtom_intel/sow/osu_xpmem_git/osu-benchmarks/mpi/collective/osu_allreduce -m 2048:2048 -i 10000 -x 200 -d ze D D
# OSU MPI-ZE Allreduce Latency Test v5.6.2
# Size Avg Latency(us)
2048 17.74
~> module load mpich/opt/develop-git.6037a7a
The following have been reloaded with a version change:
1) mpich/opt/4.2.3-intel => mpich/opt/develop-git.6037a7a
~> mpiexec -n 6 -ppn 6 --cpu-bind list:2:15:28:54:67:80 /home/rtom_intel/sow/osu_xpmem_git/osu-benchmarks/mpi/collective/osu_allreduce -m 2048:2048 -i 10000 -x 200 -d ze D D
# OSU MPI-ZE Allreduce Latency Test v5.6.2
# Size Avg Latency(us)
2048 61.02
Following environment variables were used:
export EnableImplicitScaling=0
export NEOReadDebugKeys=1
export ZE_ENABLE_PCI_ID_DEVICE_ORDER=1
export MPIR_CVAR_GPU_USE_IMMEDIATE_COMMAND_LIST=1
module load mpich-config/collective-tuning/1024
@rithwiktom Could you confirm that both are using the same collective algorithm?
I'll check it out
I suspect it is threshold change for MPIR_CVAR_GPU_FAST_COPY_MAX_SIZE. @rithwiktom Try set it to 1024.
FWIW, with https://github.com/pmodels/mpich/pull/7541, I got average latency 14.06 us.