mpich
mpich copied to clipboard
MPICH cpu-binding possible issue / unexpected behavior
Summary
This is to report a possible issue / unexpected behavior with AMD-GPU-aware MPICH on an Supermicro AS-4124GQ-TNMI (2x AMD EPYC 7713 with 4 AMD Instinct MI250). @raffenet kindly built MPICH there and it works fine but we noticed some odd behavior with the cpu-binding. I was trying to get each rank to bind in a consecutive manner to the 16 cores, i.e. rank 0 to cores 0 to 15 and rank 1 to cores 16-31. From my understanding this should be mpirun -n 2 -bind-to user:0-15,16-31
. When I tried this however, the output of HYDRA_TOPO_DEBUG looks correct but with a code we’ve used before to check affinity (https://github.com/argonne-lcf/GettingStarted/blob/master/Examples/Theta/affinity/main.cpp) it acts like it's not binding correctly, and it changes when we run multiple times. We use the affinity code at ALCF for many systems, so it's unlikely (but always possible!) that there is an issue with it. I was able to confirm with htop
that the ranks weren't running on their set of cores as well.
Reproducer
(the module is specific to our system but it loads MPICH with AMD GPU support)
module load mpich/4.2.2-rocm6.1-gcc
wget https://raw.githubusercontent.com/argonne-lcf/GettingStarted/master/Examples/Theta/affinity/main.cpp
MPICH_CXX=hipcc mpicxx -fopenmp main.cpp
export OMP_NUM_THREADS=1
HYDRA_TOPO_DEBUG=1 mpirun -n 2 -bind-to user:0-15,16-31 ./a.out
HYDRA_TOPO_DEBUG=1 mpirun -n 2 -bind-to user:0-15,16-31 ./a.out
Expected Output
We expect the affinity code to print list_cores= (0-15)
for rank 0 and list_cores= (16-31)
for rank 1, like:
[proxy:0@amdgpu04] created hwloc xml file /tmp/hydra_hwloc_xmlfile_tDWry3
process 0 binding: 11111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
process 1 binding: 00000000000000001111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
To affinity and beyond!! nname= amdgpu04 rnk= 0 tid= 0: list_cores= (0-15)
To affinity and beyond!! nname= amdgpu04 rnk= 1 tid= 0: list_cores= (16-31)
[proxy:0@amdgpu04] removed file /tmp/hydra_hwloc_xmlfile_tDWry3
[proxy:0@amdgpu04] created hwloc xml file /tmp/hydra_hwloc_xmlfile_3uSoR3
process 0 binding: 11111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
process 1 binding: 00000000000000001111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
To affinity and beyond!! nname= amdgpu04 rnk= 0 tid= 0: list_cores= (0-15)
To affinity and beyond!! nname= amdgpu04 rnk= 1 tid= 0: list_cores= (16-31)
[proxy:0@amdgpu04] removed file /tmp/hydra_hwloc_xmlfile_3uSoR3
Actual Output
As we can see below, the first run, list_cores= (0-15)
for rank 0 but list_cores= (0-255)
for rank 1. For the second run, list_cores= (0-255)
for rank 0 and list_cores= (16-31)
for rank 1.
[proxy:0@amdgpu04] created hwloc xml file /tmp/hydra_hwloc_xmlfile_tDWry3
process 0 binding: 11111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
process 1 binding: 00000000000000001111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
To affinity and beyond!! nname= amdgpu04 rnk= 0 tid= 0: list_cores= (0-15)
To affinity and beyond!! nname= amdgpu04 rnk= 1 tid= 0: list_cores= (0-255)
[proxy:0@amdgpu04] removed file /tmp/hydra_hwloc_xmlfile_tDWry3
[proxy:0@amdgpu04] created hwloc xml file /tmp/hydra_hwloc_xmlfile_3uSoR3
process 0 binding: 11111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
process 1 binding: 00000000000000001111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
To affinity and beyond!! nname= amdgpu04 rnk= 0 tid= 0: list_cores= (0-255)
To affinity and beyond!! nname= amdgpu04 rnk= 1 tid= 0: list_cores= (16-31)
[proxy:0@amdgpu04] removed file /tmp/hydra_hwloc_xmlfile_3uSoR3