flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

mpi: multiple brokers per node with multiple MPI tasks per node hangs with mvapich2-2.3.7-intel

Open grondo opened this issue 9 months ago • 2 comments

While trying to run some of the MPI tests under a test instance, I noticed that MPI bootstrap with mvapich2-2.3.7-intel hangs when there are multiple tasks per node:

$ flux run -N4 -n4 t/mpi/hello
f5Cm644CF: completed MPI_Init in 1.110s. There are 4 tasks
f5Cm644CF: completed first barrier in 0.028s
f5Cm644CF: completed MPI_Finalize in 0.041s
$ flux run -N4 -n8 t/mpi/hello
[hangs]

This is with v0.60.0. This reproduces on current master, but instead of a hang, a task get a Bus Error:

$ flux run -N4 -n8 t/mpi/hello
[corona212:mpi_rank_2][error_sighandler] Caught error: Bus error (signal 7)

The bus error occurs in MPIDI_CH3I_CM_SHMEM_Sync (). There are no details in the backtrace since debug symbols are not available.

grondo avatar Apr 25 '24 17:04 grondo

Hmm, is it of concern this didn't get caught in our GitLab CI? Our logs from last night with intel-classic and the mvapich2 compiler show:

Running with intel-classic compiler and mvapich2 MPI
f28u15Vy
f28u15Vz
f28vV4nK
f28wy44f
f2bjg5Q3
f2bjg5Q4
f2bjg5Q5
f2bmA4gP
f2bjg5Q3: completed MPI_Init in 0.461s.  There are 4 tasks
f2bjg5Q3: completed first barrier in 0.000s
f2bjg5Q3: completed MPI_Finalize in 0.010s
Hello World from rank 1
Hello World from rank 0
Hello World from rank 3
Hello World from rank 2
MVAPICH2 Version      :	2.3.7
MVAPICH2 Release date :	Wed March 02 22:00:00 EST 2022
MVAPICH2 Device       :	ch3:mrail
MVAPICH2 configure    :	--prefix=/usr/tce/backend/installations/linux-rhel8-x86_64/intel-2021.6.0/mvapich2-2.3.7-2575ifqlr5fbj34wdlj2fo2tmqdrehia --enable-shared --enable-romio --disable-silent-rules --disable-new-dtags --enable-fortran=all --enable-threads=multiple --with-ch3-rank-bits=32 --enable-wrapper-rpath=yes --disable-alloca --enable-fast=all --disable-cuda --enable-registration-cache --with-pm=hydra --with-device=ch3:mrail --with-rdma=gen2 --disable-mcast --with-file-system=lustre+nfs+ufs --enable-llnl-site-specific-options --enable-debuginfo
MVAPICH2 CC           :	/usr/tce/spack/lib/spack/env/intel/icc    -DNDEBUG -DNVALGRIND -O2
MVAPICH2 CXX          :	/usr/tce/spack/lib/spack/env/intel/icpc   -DNDEBUG -DNVALGRIND -O2
MVAPICH2 F77          :	/usr/tce/spack/lib/spack/env/intel/ifort   -O2
MVAPICH2 FC           :	/usr/tce/spack/lib/spack/env/intel/ifort   -O2

That's off current master (or, rather, whatever master was at 3AM today.)

If so, I should open a second issue on flux-test-collective to see why this didn't get caught.

wihobbs avatar Apr 25 '24 17:04 wihobbs

Does the gitlab CI run in a multiple brokers per node configuration? If not, I guess would could add that because having that working does aid in testing I suppose.

grondo avatar Apr 25 '24 20:04 grondo