hdf5
hdf5 copied to clipboard
t_mpi aborts on Fedora Rawhide with mpich on s390x
Describe the bug
Test log for t_mpi
============================
*** Hint ***
You can use environment variable HDF5_PARAPREFIX to run parallel test files in a
different directory or to add file type prefix. e.g.,
HDF5_PARAPREFIX=pfs:/PFS/user/me
export HDF5_PARAPREFIX
*** End of Hint ***
===================================
MPI functionality tests
===================================
Abort(676932623) on node 2 (rank 2 in comm 0): Fatal error in internal_Barrier: Other MPI error, error stack:
internal_Barrier(84).......................: MPI_Barrier(MPI_COMM_WORLD) failed
MPID_Barrier(167)..........................:
MPIDI_Barrier_allcomm_composition_json(132):
MPIDI_POSIX_mpi_bcast(219).................:
MPIDI_POSIX_mpi_bcast_release_gather(132)..:
MPIDI_POSIX_mpi_release_gather_release(218): message sizes do not match across processes in the collective routine: Received 0 but expected 1
Command exited with non-zero status 15
Expected behavior No test failure
Platform (please complete the following information)
- HDF5 version branch hdf5_1_14 - 847cb427cb7100be88d78e954a02a70b10d0f5c4
- OS and version - Fedora Rawhide
- Compiler and version - gcc 13.2.1-4.fc40
- Build system (e.g. CMake, Autotools) and version - autotools
- Any configure options you specified
../configure --build=s390x-redhat-linux-gnu --host=s390x-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --runstatedir=/run --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-silent-rules --enable-fortran --enable-hl --enable-shared --with-szlib CC=mpicc CXX=mpicxx F9X=mpif90 'FCFLAGS=-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=z13 -mtune=z14 -fasynchronous-unwind-tables -fstack-clash-protection -I/usr/lib64/gfortran/modules -I/usr/lib64/gfortran/modules/mpich' --enable-parallel --exec-prefix=/usr/lib64/mpich --libdir=/usr/lib64/mpich/lib --bindir=/usr/lib64/mpich/bin --sbindir=/usr/lib64/mpich/sbin --includedir=/usr/include/mpich-s390x --datarootdir=/usr/lib64/mpich/share --mandir=/usr/lib64/mpich/share/man --with-default-plugindir=/usr/lib64/mpich/hdf5/plugin
- MPI library and version (parallel HDF5) - mpich 4.1.2-7.fc40
@jhendersonHDF , @lrknox - s390x is big-endian. Do we ever see t_mpi failures on our Power system? It's probably too late to investigate this for 1.14.3, but we could build a recent version of MPICH there and test for 1.14.4.
Test is still failing with latest hdf5_1_14 branch but with a different error message it seems:
make[4]: Entering directory '/builddir/build/BUILD/hdf5-hdf5_1_14/mpich/testpar'
============================
Testing: t_mpi
============================
Test log for t_mpi
============================
Command exited with non-zero status 15