flux-core
flux-core copied to clipboard
mpi: multiple brokers per node with multiple MPI tasks per node hangs with mvapich2-2.3.7-intel
While trying to run some of the MPI tests under a test instance, I noticed that MPI bootstrap with mvapich2-2.3.7-intel hangs when there are multiple tasks per node:
$ flux run -N4 -n4 t/mpi/hello
f5Cm644CF: completed MPI_Init in 1.110s. There are 4 tasks
f5Cm644CF: completed first barrier in 0.028s
f5Cm644CF: completed MPI_Finalize in 0.041s
$ flux run -N4 -n8 t/mpi/hello
[hangs]
This is with v0.60.0. This reproduces on current master, but instead of a hang, a task get a Bus Error:
$ flux run -N4 -n8 t/mpi/hello
[corona212:mpi_rank_2][error_sighandler] Caught error: Bus error (signal 7)
The bus error occurs in MPIDI_CH3I_CM_SHMEM_Sync ()
. There are no details in the backtrace since debug symbols are not available.
Hmm, is it of concern this didn't get caught in our GitLab CI? Our logs from last night with intel-classic and the mvapich2 compiler show:
Running with intel-classic compiler and mvapich2 MPI
f28u15Vy
f28u15Vz
f28vV4nK
f28wy44f
f2bjg5Q3
f2bjg5Q4
f2bjg5Q5
f2bmA4gP
f2bjg5Q3: completed MPI_Init in 0.461s. There are 4 tasks
f2bjg5Q3: completed first barrier in 0.000s
f2bjg5Q3: completed MPI_Finalize in 0.010s
Hello World from rank 1
Hello World from rank 0
Hello World from rank 3
Hello World from rank 2
MVAPICH2 Version : 2.3.7
MVAPICH2 Release date : Wed March 02 22:00:00 EST 2022
MVAPICH2 Device : ch3:mrail
MVAPICH2 configure : --prefix=/usr/tce/backend/installations/linux-rhel8-x86_64/intel-2021.6.0/mvapich2-2.3.7-2575ifqlr5fbj34wdlj2fo2tmqdrehia --enable-shared --enable-romio --disable-silent-rules --disable-new-dtags --enable-fortran=all --enable-threads=multiple --with-ch3-rank-bits=32 --enable-wrapper-rpath=yes --disable-alloca --enable-fast=all --disable-cuda --enable-registration-cache --with-pm=hydra --with-device=ch3:mrail --with-rdma=gen2 --disable-mcast --with-file-system=lustre+nfs+ufs --enable-llnl-site-specific-options --enable-debuginfo
MVAPICH2 CC : /usr/tce/spack/lib/spack/env/intel/icc -DNDEBUG -DNVALGRIND -O2
MVAPICH2 CXX : /usr/tce/spack/lib/spack/env/intel/icpc -DNDEBUG -DNVALGRIND -O2
MVAPICH2 F77 : /usr/tce/spack/lib/spack/env/intel/ifort -O2
MVAPICH2 FC : /usr/tce/spack/lib/spack/env/intel/ifort -O2
That's off current master
(or, rather, whatever master
was at 3AM today.)
If so, I should open a second issue on flux-test-collective to see why this didn't get caught.
Does the gitlab CI run in a multiple brokers per node configuration? If not, I guess would could add that because having that working does aid in testing I suppose.