mpich icon indicating copy to clipboard operation
mpich copied to clipboard

MPICH cpu-binding possible issue / unexpected behavior

Open colleeneb opened this issue 7 months ago • 8 comments

Summary

This is to report a possible issue / unexpected behavior with AMD-GPU-aware MPICH on an Supermicro AS-4124GQ-TNMI (2x AMD EPYC 7713 with 4 AMD Instinct MI250). @raffenet kindly built MPICH there and it works fine but we noticed some odd behavior with the cpu-binding. I was trying to get each rank to bind in a consecutive manner to the 16 cores, i.e. rank 0 to cores 0 to 15 and rank 1 to cores 16-31. From my understanding this should be mpirun -n 2 -bind-to user:0-15,16-31. When I tried this however, the output of HYDRA_TOPO_DEBUG looks correct but with a code we’ve used before to check affinity (https://github.com/argonne-lcf/GettingStarted/blob/master/Examples/Theta/affinity/main.cpp) it acts like it's not binding correctly, and it changes when we run multiple times. We use the affinity code at ALCF for many systems, so it's unlikely (but always possible!) that there is an issue with it. I was able to confirm with htop that the ranks weren't running on their set of cores as well.

Reproducer

(the module is specific to our system but it loads MPICH with AMD GPU support)

module load mpich/4.2.2-rocm6.1-gcc

wget https://raw.githubusercontent.com/argonne-lcf/GettingStarted/master/Examples/Theta/affinity/main.cpp

MPICH_CXX=hipcc mpicxx -fopenmp main.cpp

export OMP_NUM_THREADS=1

HYDRA_TOPO_DEBUG=1 mpirun -n 2 -bind-to user:0-15,16-31 ./a.out

HYDRA_TOPO_DEBUG=1 mpirun -n 2 -bind-to user:0-15,16-31 ./a.out

Expected Output

We expect the affinity code to print list_cores= (0-15) for rank 0 and list_cores= (16-31) for rank 1, like:

[proxy:0@amdgpu04] created hwloc xml file /tmp/hydra_hwloc_xmlfile_tDWry3
process 0 binding: 11111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
process 1 binding: 00000000000000001111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
To affinity and beyond!! nname= amdgpu04  rnk= 0  tid= 0: list_cores= (0-15)

To affinity and beyond!! nname= amdgpu04  rnk= 1  tid= 0: list_cores= (16-31)

[proxy:0@amdgpu04] removed file /tmp/hydra_hwloc_xmlfile_tDWry3
[proxy:0@amdgpu04] created hwloc xml file /tmp/hydra_hwloc_xmlfile_3uSoR3
process 0 binding: 11111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
process 1 binding: 00000000000000001111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
To affinity and beyond!! nname= amdgpu04  rnk= 0  tid= 0: list_cores= (0-15)

To affinity and beyond!! nname= amdgpu04  rnk= 1  tid= 0: list_cores= (16-31)

[proxy:0@amdgpu04] removed file /tmp/hydra_hwloc_xmlfile_3uSoR3

Actual Output

As we can see below, the first run, list_cores= (0-15) for rank 0 but list_cores= (0-255) for rank 1. For the second run, list_cores= (0-255) for rank 0 and list_cores= (16-31) for rank 1.

[proxy:0@amdgpu04] created hwloc xml file /tmp/hydra_hwloc_xmlfile_tDWry3
process 0 binding: 11111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
process 1 binding: 00000000000000001111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
To affinity and beyond!! nname= amdgpu04  rnk= 0  tid= 0: list_cores= (0-15)

To affinity and beyond!! nname= amdgpu04  rnk= 1  tid= 0: list_cores= (0-255)

[proxy:0@amdgpu04] removed file /tmp/hydra_hwloc_xmlfile_tDWry3
[proxy:0@amdgpu04] created hwloc xml file /tmp/hydra_hwloc_xmlfile_3uSoR3
process 0 binding: 11111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
process 1 binding: 00000000000000001111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000000000000
To affinity and beyond!! nname= amdgpu04  rnk= 0  tid= 0: list_cores= (0-255)

To affinity and beyond!! nname= amdgpu04  rnk= 1  tid= 0: list_cores= (16-31)

[proxy:0@amdgpu04] removed file /tmp/hydra_hwloc_xmlfile_3uSoR3

colleeneb avatar Jul 16 '24 20:07 colleeneb