easybuild-easyconfigs
easybuild-easyconfigs copied to clipboard
MPI jobs fail with intel toolchains after upgrade of EL8 Linux from 8.5 to 8.6
I'm testing the upgrade of our compute nodes from Almalinux 8.5 to 8.6 (the RHEL 8 clone similar to Rocky Linux).
We have found that all MPI codes built with any of the Intel toolchains intel/2020b or intel/2021b fail after the 8.5 to 8.6 upgrade. The codes fail also on login nodes, so the Slurm queue system is not involved. The FOSS toolchains foss/2020b and foss/2021b work perfectly on EL 8.6, however.
My simple test uses the attached trivial MPI Hello World code running on a single node:
$ module load intel/2021b
$ mpicc mpi_hello_world.c
$ mpirun ./a.out
Now the mpirun command enters an infinite loop (running many minutes) and we see these processes with "ps":
/bin/sh /home/modules/software/impi/2021.4.0-intel-compilers-2021.4.0/mpi/2021.4.0/bin/mpirun ./a.out
mpiexec.hydra ./a.out
The mpiexec.hydra process doesn't respond to 15/SIGTERM and I have to kill it with 9/SIGKILL. I've tried to enable debugging output with
export I_MPI_HYDRA_DEBUG=1
export I_MPI_DEBUG=5
but nothing gets printed from this.
Question: Has anyone tried an EL 8.6 Linux with the Intel toolchain and mpiexec.hydra? Can you suggest how I may debug this issue?
OS information:
$ cat /etc/redhat-release
AlmaLinux release 8.6 (Sky Tiger)
$ uname -r
4.18.0-372.9.1.el8.x86_64
Quoting some discussion we've had on this in Slack:
no, its not a glibc issue afaics. If you use a RHEL8.5 kernel (with an uptodate RHEL8.6 system on the other side), intelmpi is working
Yikes...
@OleHolmNielsen Have you been in touch with Intel support on this?
@rscohn2 Any thoughts on this?
I didn't know that this issue is related to the updated RHEL 8.6 kernel, so I didn't contact Intel support yet. I've never been in touch with Intel compiler/libraries support before, so if someone else knows how to do that, could you kindly open an issue with them? Thanks, Ole
We ran into a silent hang issue several years ago too, details in https://github.com/hpcugent/vsc-mympirun/issues/74
Any luck w.r.t. getting output when using mpirun -d
?
It seems (although nothing to be seen within the kernel release notes) that numa info has changed within the kernel.
intelmpi before version 2021.6.0 gets stuck.
using pstack, one can see, that the processes seem to hang within an infinite loop somewhere around ipl_detect_machine_topology
That happens even before mpiexec.hydra tries to do something with the to be called binary (might it be a.out or hostname).
@boegel mpiexec.hydra does not know the -d parameter:
$> mpirun -d -np 2 hostname
[[email protected]] match_arg (../../../../../src/pm/i_hydra/libhydra/arg/hydra_arg.c:91): unrecognized argument d
[[email protected]] Similar arguments:
[[email protected]] membind
[[email protected]] debug
[[email protected]] dac
[[email protected]] disable-x
[[email protected]] demux
[[email protected]] HYD_arg_parse_array (../../../../../src/pm/i_hydra/libhydra/arg/hydra_arg.c:128): argument matching returned error
[[email protected]] mpiexec_get_parameters (../../../../../src/pm/i_hydra/mpiexec/mpiexec_params.c:1356): error parsing input array
[[email protected]] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1749): error parsing parameters
but it knows --debug, but the only thing you see, is the called command:
$> mpiexec.hydra --debug -np 2 hostname
[[email protected]] Launch arguments: /cvmfs/software.hpc.rwth.de/Linux/RH8/x86_64/intel/skylake_avx512/software/impi/2021.2.0-intel-compilers-2021.2.0/mpi/2021.2.0/bin//hydra_bstrap_proxy --upstream-host nrm095.hpc.itc.rwth-aachen.de --upstream-port 44829 --pgid 0 --launcher ssh --launcher-number 0 --base-path /cvmfs/software.hpc.rwth.de/Linux/RH8/x86_64/intel/skylake_avx512/software/impi/2021.2.0-intel-compilers-2021.2.0/mpi/2021.2.0/bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /cvmfs/software.hpc.rwth.de/Linux/RH8/x86_64/intel/skylake_avx512/software/impi/2021.2.0-intel-compilers-2021.2.0/mpi/2021.2.0/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9
Looking at https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/bug-mpiexec-segmentation-fault/m-p/1183364, you can influence this with
I_MPI_HYDRA_TOPOLIB=ipl
(Ha, look who is posting the last comment in that link)
using impi 2021.6.0, everything is working:
$> mpiexec.hydra --version
Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)
Copyright 2003-2022, Intel Corporation.
$> mpiexec.hydra -np 2 hostname
nrm095.hpc.itc.rwth-aachen.de
nrm095.hpc.itc.rwth-aachen.de
@ocaisa that doesn't change anything, looping around the same function
ahh, yes, I forgot, we have had an issue open with intel (case# 05472393) regarding the problem. Their first comment was as usual "are you trying the newest version?" I did not visit ISC this year, but some of my colleagues did and they talked to some intel guys directly. Outcome was, with RHEL 8.6 and newer, old intelmpi is not working anymore.
Is there any chance that Red Hat will accept a bug report for the older IntelMPI versions not working? This would require a deeper understanding of what changes in the new kernel does that breaks IntelMPI, so documenting the bug might be a challenge...
@OleHolmNielsen kernel updates that break userspace are frwoned upon, so you can try to open a bugreport with redhat. they will at some point as you what they need, or point you to the release notes that say what has changed that broke this. they will probaby blame intel (and sounds like intel already fixed it, but doesn't want to backport it)
@stdweird Yes, but how do we get any error messages from mpiexec.hydra which can be reported to Red Hat?
@OleHolmNielsen the error you need to report is that an application is hanging since an upgrade to RHEL8.6 was done. you can already add what was said here (ie it works on 8.5, pstack points to the ipl thingie so they can have some idea in what direction to look).
@stdweird Thanks for the info. I have made this test:
$ module load iimpi/2021b $ module list Currently Loaded Modules:
- GCCcore/11.2.0 5) numactl/2.0.14-GCCcore-11.2.0
- zlib/1.2.11-GCCcore-11.2.0 6) UCX/1.11.2-GCCcore-11.2.0
- binutils/2.37-GCCcore-11.2.0 7) impi/2021.4.0-intel-compilers-2021.4.0
- intel-compilers/2021.4.0 8) iimpi/2021b $ which mpiexec /home/modules/software/impi/2021.4.0-intel-compilers-2021.4.0/mpi/2021.4.0/bin/mpiexec $ mpiexec.hydra --version
Now I can execute pstack on the process PID:
$ pstack 717906
#0 0x000000000045009a in ipl_get_exclude_mask (str=
Do we agree that this is the issue which I should report to Red Hat?
Thanks, Ole
@OleHolmNielsen the issue to report to RHEL is that your application is hanging after an upgrade. RH has no knoweldge about intel mpi itself (and they will most likely not provide a solution, only an explanation)
I have created an issue in the Red Hat Bugzilla: [Bug 2095281] New: Intel MPI mpiexec.hydra hangs after upgrade to RHEL 8.6 This bug is unfortunately not accessible to others because it relates to the kernel.
AFAIK, you can add anyone (with their email) to the report, so that they can also read it...
I anyone would like their E-mail to be added to the Red Hat bug 2095281 you can ask me to do it.
If there's a regression in the RHEL kernel topology information, you may want to compare the output of lstopo before and after the upgrade.
@bgoglin I took an EL85 node and copied the output of lstopo to a file. Then I upgraded the node to EL86 and rebooted. The EL86 lstopo output is 100% identical to that of EL85.
The Intel MPI Release Notes at https://www.intel.com/content/www/us/en/developer/articles/release-notes/mpi-library-release-notes-linux.html don't mention any bugs related to mpixec.hydra, there's only a terse "Bug fixes" line.
I have not been able to locate the mentioned intel case# 05472393.
It would seem that going forward with EL8.6, we can no longer use the older Intel MPI libraries prior to 2021.6. So much for all the EasyBuild modules based on intel toolchains which we have already installed :-(
I received a response in Red Hat bug 2095281:
I agree that it looks like the kernel should be blamed too, but
this is not necessarily true.
Finally. In any case the application is buggy. It should not spin in the
infinite loop anyway. According to pstack it doesn't hang in syscall. And
this is what we need to investigate first, imo. Until then it is absolutely
unclear how can we find the root of the problem, if _if_ the kernel is wrong.
In short. IMO, this is user-space bug no matter what.
So the conclusion is that Intel MPI prior to 2021.6 is buggy. We cannot use older Intel MPI versions on EL 8.6 kernels then :-(
If no workaround is found, it seems that all EB modules iimpi/* prior to 2021.6 have to be discarded after we upgrade from EL 8.5 to 8.6.
Or the impi
in the installed iimpi
and intel
toolchains is updated in place to 2021.6
(not happy with that workaround, but I see no better alternative).
Should only be done on a per-site initiative I think.
For the record: When I load the module iimpi/2021b on an EL 8.6 node running kernel 4.18.0-372.9.1.el8.x86_64, the mpiexec.hydra enters an infinite loop while reading /sys/devices/system/node/node0/cpulist as seen by strace:
$ strace -f -e file mpiexec.hydra --version (many lines deleted) openat(AT_FDCWD, "/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3 openat(-1, "/sys/devices/system/cpu/possible", O_RDONLY) = 3 openat(AT_FDCWD, "/sys/devices/system/node/node0/cpulist", O_RDONLY) = 3 (Now I type Ctrl-C) ^C--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} --- strace: Process 4960 detached
After rebooting the node with the EL 8.5 kernel 4.18.0-348.23.1.el8_5.x86_64 the mpiexec.hydra works correctly.
I've now built the EB module iimpi/2022.05 which contains the latest Intel MPI module:
$ ml Currently Loaded Modules:
- GCCcore/11.3.0 5) numactl/2.0.14-GCCcore-11.3.0
- zlib/1.2.12-GCCcore-11.3.0 6) UCX/1.12.1-GCCcore-11.3.0
- binutils/2.38-GCCcore-11.3.0 7) impi/2021.6.0-intel-compilers-2022.1.0
- intel-compilers/2022.1.0 8) iimpi/2022.05
Running this module on the EL 8.6 node running kernel 4.18.0-372.9.1.el8.x86_64 the mpiexec.hydra works correctly (as observed by others):
$ mpiexec.hydra --version Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32) Copyright 2003-2022, Intel Corporation.
One additional information is about the Intel MKL library: I've built the latest EB module imkl/2022.1.0 which includes an HPL benchmark executable .../modules/software/imkl/2022.1.0/mkl/2022.1.0/benchmarks/linpack/xlinpack_xeon64
Running the MKL2022.1.0 xlinpack_xeon64 executable also results in multiple copies of mpiexec.hydra in infinite loops, just like with Intel MPI prior to 2021.6.
I think there exists a newer MKL 2022.2.0 but I don't know how to make en EB module with it for testing - can anyone help?
I think there exists a newer MKL 2022.2.0 but I don't know how to make en EB module with it for testing - can anyone help?
I see 2022.1.0 on https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#inpage-nav-9-7
This is the easyconfig that you've tested: https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/i/imkl/imkl-2022.1.0.eb - to update it you would change:
source_urls = ['https://registrationcenter-download.intel.com/akdlm/irc_nas/18721/']
sources = ['l_onemkl_p_%(version)s.223_offline.sh']
with the relevant source url and source for the offline Linux installer.
I have built the intel/2022a toolchain with EB 4.6.0, and I can confirm that with the new module impi/2021.6.0-intel-compilers-2022.1.0 the above issue with all previous Intel MPI versions has been resolved:
$ module load impi/2021.6.0-intel-compilers-2022.1.0
$ mpiexec.hydra --version
Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)
Copyright 2003-2022, Intel Corporation.
Of course, we still face an issue with all software modules that use the Intel MPI module prior to 2021.6.0 being broken on EL8 systems running the latest kernel.
We got some feedback from intel:
The issue was analyzed and the root cause was found. In RHEL8.6 and other OS with recent kernel versions, system files are reported to have 0 bytes size. In previous kernel versions ftell was reporting size == blocksize != 0.
Using size==0 lead to a memory leak with the known consequences.
I have written a small workaround library that can be used with LD_PRELOAD. This lib will use an "adapted" version of ftell for the startup of IMPI. Once the program is started there should be no issue. It is also possible to switch off LD_PRELOAD for the user mpi program.
If this form of workaround is acceptable and you are willing to test it I can attach it to this issue.
Preferred methodology is, however, to use the newest version of IMPI.