easybuild-easyconfigs
easybuild-easyconfigs copied to clipboard
cannot find pmi2.h when compiling openmpi with slurm support
When I uncomment the SLURM specifics in the OpenMPI-2.0.2 easyconfig,
# to enable SLURM integration (site-specific)
configopts += '--with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr'
I get the error
checking if user requested PMI support... yes
checking for pmi.h in /usr/include/slurm/include... not found
checking for pmi.h in /usr/include/slurm/include/slurm... not found
checking for libpmi in /usr... not found
checking for libpmi in /usr/lib64... found
checking for PMI_Init in -lpmi... yes
checking for pmi2.h in /usr/include/slurm/include... not found
checking for pmi2.h in /usr/include/slurm/include/slurm... not found
checking for libpmi2 in /usr... not found
checking for libpmi2 in /usr/lib64... found
checking for PMI2_Init in -lpmi2... yes
checking can PMI support be built... no
configure: WARNING: PMI support requested (via --with-pmi) but neither pmi.h
configure: WARNING: nor pmi2.h were found under locations:
configure: WARNING: /usr/include/slurm
configure: WARNING: /usr/include/slurm/slurm
configure: WARNING: Specified path: /usr/include/slurm
configure: WARNING: OR neither libpmi nor libpmi2 were found under:
configure: WARNING: /usr/lib
configure: WARNING: /usr/lib64
configure: WARNING: Specified path: /usr
configure: error: Aborting
This is strange, because pmi2.h
is actually there:
$ ls -l /usr/include/slurm/pmi2.h
-rw-r--r-- 1 root root 27152 Oct 25 2016 /usr/include/slurm/pmi2.h
Debug log is at https://gist.github.com/andreas-h/d810c667bc38efbbb7bf3ca02c0007df, any help is greatly appreciated.
Okay, so it seems to work with
# to enable SLURM integration (site-specific)
configopts += '--with-slurm --with-pmi=/usr --with-pmi-libdir=/usr'
So it would be user-friendly to update the comments in the OpenMPI easyconfigs accordingly.
@akesandgren Thoughts on this? Should we correct the comment to use --with-pmi=/usr
?
Using --with-xxx=/usr is ALWAYS wrong. It messes up lib and include search paths for other things that you want to override from the OS.
If the configure thing doesn't find the required include files/libs then the configure script is broken and should be fixed.
Never ever do --with-xxx=/usr !!
I assume the above problem is caused by having slurm installed from a OS distro package? (something you really don't want to do)
Using --with-xxx=/usr is ALWAYS wrong. It messes up lib and include search paths for other things that you want to override from the OS.
If the configure thing doesn't find the required include files/libs then the configure script is broken and should be fixed.
Never ever do --with-xxx=/usr !!
I assume the above problem is caused by having slurm installed from a OS distro package? (something you really don't want to do)
But isn't this rather an issue with the OpenMPI configure script?
I mean, when I specify /usr
, then it looks in
/usr/include/slurm
. So I don't see why SLURM should be to
blame.
I'm not blaming slurm for the pmi problem. Just saying that you really want to use a very recent version of slurm, not whatever ancient version the OS distro has packaged.
It's SLURM from OpenHPC repos.
We got around this error for SLURM 17.11 and OpenMPI 2.1.2-GCC-6.4.0-2.28 by:
adding "--with-slurm --with-pmi" and installing locally slurm-libpmi rpm on the build node.
I can confirm that the OpenMPI 3.1.1-GCC-7.3.0-2.30 builds properly with --with-slurm --with-pmi
with locally installed slurm pmi rpms (SLURM 17.11.9 from the openhpc repository). Using the flags provided in the EasyConfig file did not work (same error as above with older OpenMPI)
Using --with-xxx=/usr is ALWAYS wrong. It messes up lib and include search paths for other things that you want to override from the OS.
If the configure thing doesn't find the required include files/libs then the configure script is broken and should be fixed.
Never ever do --with-xxx=/usr !!
I assume the above problem is caused by having slurm installed from a OS distro package? (something you really don't want to do)
Well it really depends on the OS and the package management system.. It is trivial to edit and use PKGBUILD
files in archlinux
to install and remove things from /usr
..
I can confirm that the OpenMPI 3.1.1-GCC-7.3.0-2.30 builds properly with
--with-slurm --with-pmi
with locally installed slurm pmi rpms (SLURM 17.11.9 from the openhpc repository). Using the flags provided in the EasyConfig file did not work (same error as above with older OpenMPI)
Works for me also with OpenHPC 1.3.8 and slurm 18.08.7.
I also encountered this issue while building OpenMPI 3.1.1-GCC-7.3.0-2.30
.
I'm using Ubuntu 18.04
and try to get it to work with packages from the standard repositories which include SLURM 17.11.2
. To get the header and library files I tried libpmi0-dev
and libpmi2-0-dev
. The packages provide these files:
libpmi0-dev:
/usr/include/slurm-wlm/pmi.h
/usr/lib/x86_64-linux-gnu/libpmi.a
/usr/lib/x86_64-linux-gnu/libpmi.la
/usr/lib/x86_64-linux-gnu/libpmi.so
libpmi2-0-dev:
/usr/include/slurm-wlm/pmi2.h
/usr/lib/x86_64-linux-gnu/libpmi2.a
/usr/lib/x86_64-linux-gnu/libpmi2.la
/usr/lib/x86_64-linux-gnu/libpmi2.so
So I used this line:
configopts = '--with-slurm --with-pmi=/usr/include/slurm-wlm --with-pmi-libdir=/usr/lib/x86_64-linux-gnu'
The configure-checks for PMI succeed:
libpmi0-dev:
checking if user requested PMI support... yes
checking for pmi.h in /usr/include/slurm-wlm... found
checking pmi.h usability... yes
checking pmi.h presence... yes
checking for pmi.h... yes
checking for libpmi in /usr/lib/x86_64-linux-gnu... found
checking for PMI_Init in -lpmi... yes
checking for pmi2.h in /usr/include/slurm-wlm... not found
checking for pmi2.h in /usr/include/slurm-wlm/include... not found
checking for pmi2.h in /usr/include/slurm-wlm/include/slurm... not found
checking for libpmi2 in /usr/lib/x86_64-linux-gnu... not found
checking can PMI support be built... yes
libpmi2-0-dev:
checking if user requested PMI support... yes
checking for pmi.h in /usr/include/slurm-wlm... not found
checking for pmi.h in /usr/include/slurm-wlm/include... not found
checking for pmi.h in /usr/include/slurm-wlm/include/slurm... not found
checking for libpmi in /usr/lib/x86_64-linux-gnu... not found
checking for pmi2.h in /usr/include/slurm-wlm... found
checking pmi2.h usability... yes
checking pmi2.h presence... yes
checking for pmi2.h... yes
checking for libpmi2 in /usr/lib/x86_64-linux-gnu... found
checking for PMI2_Init in -lpmi2... yes
checking can PMI support be built... yes
both:
checking if user requested PMI support... yes
checking for pmi.h in /usr/include/slurm-wlm... found
checking pmi.h usability... yes
checking pmi.h presence... yes
checking for pmi.h... yes
checking for libpmi in /usr/lib/x86_64-linux-gnu... found
checking for PMI_Init in -lpmi... yes
checking for pmi2.h in /usr/include/slurm-wlm... found
checking pmi2.h usability... yes
checking pmi2.h presence... yes
checking for pmi2.h... yes
checking for libpmi2 in /usr/lib/x86_64-linux-gnu... found
checking for PMI2_Init in -lpmi2... yes
checking can PMI support be built... yes
However later the builds fail:
libpmi0-dev:
pmix_s1.c:29:10: fatal error: pmi.h: No such file or directory
#include <pmi.h>
^~~~~~~
compilation terminated.
Makefile:1903: recipe for target 'mca_pmix_s1_la-pmix_s1.lo' failed
libpmi2-0-dev:
pmix_s2.c:33:10: fatal error: pmi2.h: No such file or directory
#include <pmi2.h>
^~~~~~~~
compilation terminated.
Makefile:1915: recipe for target 'mca_pmix_s2_la-pmix_s2.lo' failed
both:
#include <pmi.h>
^~~~~~~
compilation terminated.
Makefile:1903: recipe for target 'mca_pmix_s1_la-pmix_s1.lo' failed
I don't have any idea why the headers are found while configuring and not found while building.
Update:
Using pmix like this lets you compile OpenMPI (with the included pmix, maybe this line should replace the one in the easyconfig comment?):
configopts = '--with-slurm --with-pmix'
However this is not usable (with srun --mpi=pmix
) becuase pmix isn't included in the slurm plugins from the repository, only pmi2 is:
$ srun --mpi=list
srun: MPI types are...
srun: openmpi
srun: none
srun: pmi2
Someone should contact SLURM support support for this? @SethosII?
Remember that both pmi and pmi2 has been deprecated in Slurm and have been moved to the contribs section. Can't remember which version that was done in. To enable them one have to explicitly go into contribs and build them.
@boegel I don't have access to the paid SLURM support and I don't think that the compilation problem is related to SLURM. For the pmix part I created a bug report in Debian (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=948008), which is already processed and will eventually find it's way in the distribution.
4.1.4 on GCC-11.3.0 suffers from the same problem.
If I use configopts += '--with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr/lib/x86_64-linux-gnu'
as @SethosII mentions, it goes further, but then much later complain again that can't find pmi.h
If I do a configopts += '--with-slurm --with-pmi'
without specifying the paths, it goes further, as it find slurm's OS headers by itself, and it finishes
This is ubuntu 20.04.
diff --git a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.4-GCC-11.3.0.eb b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.4-GCC-11.3.0.eb
index 524393cf31..f2843fd774 100644
--- a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.4-GCC-11.3.0.eb
+++ b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.4-GCC-11.3.0.eb
@@ -30,8 +30,8 @@ dependencies = [
('zlib', '1.2.12'),
('hwloc', '2.7.1'),
('libevent', '2.1.12'),
- ('libfabric', '1.15.1'),
+ # ('libfabric', '1.15.1'),
('PMIx', '4.1.2'),
('UCC', '1.0.0'),
]
@@ -54,9 +54,8 @@ preconfigopts = ' && '.join([
configopts = '--with-cuda=internal '
# disable MPI1 compatibility for now, see what breaks...
-# configopts += '--enable-mpi1-compatibility '
+configopts += '--enable-mpi1-compatibility '
# to enable SLURM integration (site-specific)
-# configopts += '--with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr'
-
+configopts += '--with-slurm --with-pmi'
moduleclass = 'mpi'
I had the same problem, trying to build OpenMPI with a locally built pmi2 from the slurm contrib source and failing at the recipe mca_pmix_s2_la-pmix_s2.lo
, where make complains about not finding pmi2.h
. (PMIx was no problem, but our slurm was configured without PMIx support, so I needed PMI2). I think I found a solution:
In my case, my local pmi2 is located at $HOME/apps/pmi2
. I've linked the .libs
folder with the lib files to a visible pmi2/lib
folder and I've copied the pmi2.h
header to a pmi2/include
folder. Adjust paths accordingly.
- In the
Makefile
atopal/mca/pmix/s2
, locate the recipe formca_pmix_s2_la-pmix_s2.lo
and adjust the includes list by appending the pmi2 folder:
... $(INCLUDES) -I${HOME}/apps/pmi2/include ...
- In the same makefile, locate the recipe for
mca_pmix_s2.la
and add a library path:
.. $(mca_pmix_s2_la_LIBADD) -L${HOME}/apps/pmi2/lib $(LIBS) ..
Somewhat fiddly, but works in my case. There's probably some easier way to just set the pmi2 path for all steps, but it's really just those two recipes that fail.
Same issue for OpenMPI 4.1.5 and GCC 12.3.0 .
It does not work If I spicify the path wit --with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr/lib64
, and the buildling process complains with:
Making all in mca/pmix/s1
make[2]: Entering directory `/dev/shm/easybuild/OpenMPI/4.1.5/GCC-12.3.0/openmpi-4.1.5/opal/mca/pmix/s1'
CC mca_pmix_s1_la-pmix_s1_component.lo
CC mca_pmix_s1_la-pmix_s1.lo
pmix_s1.c:29:10: fatal error: pmi.h: No such file or directory
29 | #include <pmi.h>
| ^~~~~~~
compilation terminated.
make[2]: *** [mca_pmix_s1_la-pmix_s1.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory `/dev/shm/easybuild/OpenMPI/4.1.5/GCC-12.3.0/openmpi-4.1.5/opal/mca/pmix/s1'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/dev/shm/easybuild/OpenMPI/4.1.5/GCC-12.3.0/openmpi-4.1.5/opal'
make: *** [all-recursive] Error 1
(at easybuild/common/software/EasyBuild/4.7.0/lib/python2.7/site-packages/easybuild/tools/run.py:643 in parse_cmd_output)
However, it builds properly whith setting just: configopts = ' --with-slurm --with-pmi '
I have found this behaviour in two different servers (RHEL 7.2 and 7.7 ) both running SLURM 18.08.