t_pmulti_dset hangs on Fedora Rawhide aarch64 with mpich
Describe the bug Trying to build hdf5_1_14 branch in Fedora Rawhide on aarch64. Koji builder seems to hang with:
make[4]: Leaving directory '/builddir/build/BUILD/hdf5-hdf5_1_14/mpich/testpar'
make[4]: Entering directory '/builddir/build/BUILD/hdf5-hdf5_1_14/mpich/testpar'
============================
Testing: t_pmulti_dset
It does not appear that the alarm goes off either.
Platform (please complete the following information)
- HDF5 version hdf5_1_14 from Oct 20, 2023
- OS and version Fedora Rawhide
- Compiler and version gcc 13.2.1
- Build system (e.g. CMake, Autotools) and version - autotools
- Any configure options you specified
+ ../configure --build=aarch64-redhat-linux-gnu --host=aarch64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --runstatedir=/run --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-silent-rules --enable-fortran --enable-hl --enable-shared --with-szlib CC=mpicc CXX=mpicxx F9X=mpif90 'FCFLAGS=-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -I/usr/lib64/gfortran/modules -I/usr/lib64/gfortran/modules/mpich' --enable-parallel --exec-prefix=/usr/lib64/mpich --libdir=/usr/lib64/mpich/lib --bindir=/usr/lib64/mpich/bin --sbindir=/usr/lib64/mpich/sbin --includedir=/usr/include/mpich-aarch64 --datarootdir=/usr/lib64/mpich/share --mandir=/usr/lib64/mpich/share/man --with-default-plugindir=/usr/lib64/mpich/hdf5/plugin
- MPI library and version (parallel HDF5) mpich-4.1.2
I'm not seeing this with latest hdf5_1_14 and latest Fedora Rawhide.
I think this may be intermittent. Seen again now with 1.14.5, mpich 4.2.2
Now seen once with hdf5 1.14.6 on ppc64le, mpich 4.2.2.
Thanks for the report @opoplawski. Would it be possible to try with MPICH 4.3.0 to rule out whether it's an MPICH issue? Since you previously tested with 4.1.2 and now 4.2.2, I'm assuming it's our issue but it's good to be sure. Also, do you happen to see any warnings in the log when building that test? If so, would you be able to either post those here or upload the log? Similar to #2510, it could be that this is due to some assumptions in the test code.
#2510 is an allocation issue. I'll create a PR for the fix next week.