easybuild-easyconfigs icon indicating copy to clipboard operation
easybuild-easyconfigs copied to clipboard

cannot find pmi2.h when compiling openmpi with slurm support

Open andreas-h opened this issue 6 years ago • 17 comments

When I uncomment the SLURM specifics in the OpenMPI-2.0.2 easyconfig,

# to enable SLURM integration (site-specific)
configopts += '--with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr'

I get the error

checking if user requested PMI support... yes
checking for pmi.h in /usr/include/slurm/include... not found
checking for pmi.h in /usr/include/slurm/include/slurm... not found
checking for libpmi in /usr... not found
checking for libpmi in /usr/lib64... found
checking for PMI_Init in -lpmi... yes
checking for pmi2.h in /usr/include/slurm/include... not found
checking for pmi2.h in /usr/include/slurm/include/slurm... not found
checking for libpmi2 in /usr... not found
checking for libpmi2 in /usr/lib64... found
checking for PMI2_Init in -lpmi2... yes
checking can PMI support be built... no
configure: WARNING: PMI support requested (via --with-pmi) but neither pmi.h
configure: WARNING: nor pmi2.h were found under locations:
configure: WARNING:     /usr/include/slurm
configure: WARNING:     /usr/include/slurm/slurm
configure: WARNING: Specified path: /usr/include/slurm
configure: WARNING: OR neither libpmi nor libpmi2 were found under:
configure: WARNING:     /usr/lib
configure: WARNING:     /usr/lib64
configure: WARNING: Specified path: /usr
configure: error: Aborting

This is strange, because pmi2.h is actually there:

$ ls -l /usr/include/slurm/pmi2.h 
-rw-r--r-- 1 root root 27152 Oct 25  2016 /usr/include/slurm/pmi2.h

Debug log is at https://gist.github.com/andreas-h/d810c667bc38efbbb7bf3ca02c0007df, any help is greatly appreciated.

andreas-h avatar Mar 12 '18 16:03 andreas-h

Okay, so it seems to work with

# to enable SLURM integration (site-specific)
configopts += '--with-slurm --with-pmi=/usr --with-pmi-libdir=/usr'

So it would be user-friendly to update the comments in the OpenMPI easyconfigs accordingly.

andreas-h avatar Mar 12 '18 16:03 andreas-h

@akesandgren Thoughts on this? Should we correct the comment to use --with-pmi=/usr?

boegel avatar Mar 12 '18 18:03 boegel

Using --with-xxx=/usr is ALWAYS wrong. It messes up lib and include search paths for other things that you want to override from the OS.

If the configure thing doesn't find the required include files/libs then the configure script is broken and should be fixed.

Never ever do --with-xxx=/usr !!

I assume the above problem is caused by having slurm installed from a OS distro package? (something you really don't want to do)

akesandgren avatar Mar 13 '18 07:03 akesandgren

Using --with-xxx=/usr is ALWAYS wrong. It messes up lib and include search paths for other things that you want to override from the OS.

If the configure thing doesn't find the required include files/libs then the configure script is broken and should be fixed.

Never ever do --with-xxx=/usr !!

I assume the above problem is caused by having slurm installed from a OS distro package? (something you really don't want to do)

But isn't this rather an issue with the OpenMPI configure script? I mean, when I specify /usr, then it looks in /usr/include/slurm. So I don't see why SLURM should be to blame.

andreas-h avatar Mar 13 '18 10:03 andreas-h

I'm not blaming slurm for the pmi problem. Just saying that you really want to use a very recent version of slurm, not whatever ancient version the OS distro has packaged.

akesandgren avatar Mar 13 '18 10:03 akesandgren

It's SLURM from OpenHPC repos.

andreas-h avatar Mar 13 '18 10:03 andreas-h

We got around this error for SLURM 17.11 and OpenMPI 2.1.2-GCC-6.4.0-2.28 by:

adding "--with-slurm --with-pmi" and installing locally slurm-libpmi rpm on the build node.

martbhell avatar Apr 11 '18 14:04 martbhell

I can confirm that the OpenMPI 3.1.1-GCC-7.3.0-2.30 builds properly with --with-slurm --with-pmi with locally installed slurm pmi rpms (SLURM 17.11.9 from the openhpc repository). Using the flags provided in the EasyConfig file did not work (same error as above with older OpenMPI)

andreas-h avatar Dec 06 '18 11:12 andreas-h

Using --with-xxx=/usr is ALWAYS wrong. It messes up lib and include search paths for other things that you want to override from the OS.

If the configure thing doesn't find the required include files/libs then the configure script is broken and should be fixed.

Never ever do --with-xxx=/usr !!

I assume the above problem is caused by having slurm installed from a OS distro package? (something you really don't want to do)

Well it really depends on the OS and the package management system.. It is trivial to edit and use PKGBUILD files in archlinux to install and remove things from /usr..

HaoZeke avatar Feb 25 '19 23:02 HaoZeke

I can confirm that the OpenMPI 3.1.1-GCC-7.3.0-2.30 builds properly with --with-slurm --with-pmi with locally installed slurm pmi rpms (SLURM 17.11.9 from the openhpc repository). Using the flags provided in the EasyConfig file did not work (same error as above with older OpenMPI)

Works for me also with OpenHPC 1.3.8 and slurm 18.08.7.

omula avatar Sep 05 '19 16:09 omula

I also encountered this issue while building OpenMPI 3.1.1-GCC-7.3.0-2.30.

I'm using Ubuntu 18.04 and try to get it to work with packages from the standard repositories which include SLURM 17.11.2. To get the header and library files I tried libpmi0-dev and libpmi2-0-dev. The packages provide these files:

libpmi0-dev:

/usr/include/slurm-wlm/pmi.h
/usr/lib/x86_64-linux-gnu/libpmi.a
/usr/lib/x86_64-linux-gnu/libpmi.la
/usr/lib/x86_64-linux-gnu/libpmi.so

libpmi2-0-dev:

/usr/include/slurm-wlm/pmi2.h
/usr/lib/x86_64-linux-gnu/libpmi2.a
/usr/lib/x86_64-linux-gnu/libpmi2.la
/usr/lib/x86_64-linux-gnu/libpmi2.so

So I used this line:

configopts = '--with-slurm --with-pmi=/usr/include/slurm-wlm --with-pmi-libdir=/usr/lib/x86_64-linux-gnu'

The configure-checks for PMI succeed:

libpmi0-dev:

checking if user requested PMI support... yes
checking for pmi.h in /usr/include/slurm-wlm... found
checking pmi.h usability... yes
checking pmi.h presence... yes
checking for pmi.h... yes
checking for libpmi in /usr/lib/x86_64-linux-gnu... found
checking for PMI_Init in -lpmi... yes
checking for pmi2.h in /usr/include/slurm-wlm... not found
checking for pmi2.h in /usr/include/slurm-wlm/include... not found
checking for pmi2.h in /usr/include/slurm-wlm/include/slurm... not found
checking for libpmi2 in /usr/lib/x86_64-linux-gnu... not found
checking can PMI support be built... yes

libpmi2-0-dev:

checking if user requested PMI support... yes
checking for pmi.h in /usr/include/slurm-wlm... not found
checking for pmi.h in /usr/include/slurm-wlm/include... not found
checking for pmi.h in /usr/include/slurm-wlm/include/slurm... not found
checking for libpmi in /usr/lib/x86_64-linux-gnu... not found
checking for pmi2.h in /usr/include/slurm-wlm... found
checking pmi2.h usability... yes
checking pmi2.h presence... yes
checking for pmi2.h... yes
checking for libpmi2 in /usr/lib/x86_64-linux-gnu... found
checking for PMI2_Init in -lpmi2... yes
checking can PMI support be built... yes

both:

checking if user requested PMI support... yes
checking for pmi.h in /usr/include/slurm-wlm... found
checking pmi.h usability... yes
checking pmi.h presence... yes
checking for pmi.h... yes
checking for libpmi in /usr/lib/x86_64-linux-gnu... found
checking for PMI_Init in -lpmi... yes
checking for pmi2.h in /usr/include/slurm-wlm... found
checking pmi2.h usability... yes
checking pmi2.h presence... yes
checking for pmi2.h... yes
checking for libpmi2 in /usr/lib/x86_64-linux-gnu... found
checking for PMI2_Init in -lpmi2... yes
checking can PMI support be built... yes

However later the builds fail:

libpmi0-dev:

pmix_s1.c:29:10: fatal error: pmi.h: No such file or directory
 #include <pmi.h>
          ^~~~~~~
compilation terminated.
Makefile:1903: recipe for target 'mca_pmix_s1_la-pmix_s1.lo' failed

libpmi2-0-dev:

pmix_s2.c:33:10: fatal error: pmi2.h: No such file or directory
 #include <pmi2.h>
          ^~~~~~~~
compilation terminated.
Makefile:1915: recipe for target 'mca_pmix_s2_la-pmix_s2.lo' failed

both:

 #include <pmi.h>
          ^~~~~~~
compilation terminated.
Makefile:1903: recipe for target 'mca_pmix_s1_la-pmix_s1.lo' failed

I don't have any idea why the headers are found while configuring and not found while building.

Update:

Using pmix like this lets you compile OpenMPI (with the included pmix, maybe this line should replace the one in the easyconfig comment?):

configopts = '--with-slurm --with-pmix'

However this is not usable (with srun --mpi=pmix) becuase pmix isn't included in the slurm plugins from the repository, only pmi2 is:

$ srun --mpi=list
srun: MPI types are...
srun: openmpi
srun: none
srun: pmi2

SethosII avatar Jan 02 '20 14:01 SethosII

Someone should contact SLURM support support for this? @SethosII?

boegel avatar Jan 18 '20 09:01 boegel

Remember that both pmi and pmi2 has been deprecated in Slurm and have been moved to the contribs section. Can't remember which version that was done in. To enable them one have to explicitly go into contribs and build them.

akesandgren avatar Jan 18 '20 16:01 akesandgren

@boegel I don't have access to the paid SLURM support and I don't think that the compilation problem is related to SLURM. For the pmix part I created a bug report in Debian (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=948008), which is already processed and will eventually find it's way in the distribution.

SethosII avatar Jan 20 '20 06:01 SethosII

4.1.4 on GCC-11.3.0 suffers from the same problem.

If I use configopts += '--with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr/lib/x86_64-linux-gnu' as @SethosII mentions, it goes further, but then much later complain again that can't find pmi.h

If I do a configopts += '--with-slurm --with-pmi' without specifying the paths, it goes further, as it find slurm's OS headers by itself, and it finishes

This is ubuntu 20.04.

diff --git a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.4-GCC-11.3.0.eb b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.4-GCC-11.3.0.eb
index 524393cf31..f2843fd774 100644
--- a/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.4-GCC-11.3.0.eb
+++ b/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.4-GCC-11.3.0.eb
@@ -30,8 +30,8 @@ dependencies = [
     ('zlib', '1.2.12'),
     ('hwloc', '2.7.1'),
     ('libevent', '2.1.12'),
-    ('libfabric', '1.15.1'),
+    # ('libfabric', '1.15.1'),
     ('PMIx', '4.1.2'),
     ('UCC', '1.0.0'),
 ]
@@ -54,9 +54,8 @@ preconfigopts = ' && '.join([
 configopts = '--with-cuda=internal '
 
 # disable MPI1 compatibility for now, see what breaks...
-# configopts += '--enable-mpi1-compatibility '
+configopts += '--enable-mpi1-compatibility '
 
 # to enable SLURM integration (site-specific)
-# configopts += '--with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr'
-
+configopts += '--with-slurm --with-pmi'
 moduleclass = 'mpi'

surak avatar Jul 26 '23 16:07 surak

I had the same problem, trying to build OpenMPI with a locally built pmi2 from the slurm contrib source and failing at the recipe mca_pmix_s2_la-pmix_s2.lo, where make complains about not finding pmi2.h. (PMIx was no problem, but our slurm was configured without PMIx support, so I needed PMI2). I think I found a solution:

In my case, my local pmi2 is located at $HOME/apps/pmi2. I've linked the .libs folder with the lib files to a visible pmi2/lib folder and I've copied the pmi2.h header to a pmi2/include folder. Adjust paths accordingly.

  1. In the Makefile at opal/mca/pmix/s2, locate the recipe for mca_pmix_s2_la-pmix_s2.lo and adjust the includes list by appending the pmi2 folder:
... $(INCLUDES) -I${HOME}/apps/pmi2/include ...
  1. In the same makefile, locate the recipe for mca_pmix_s2.la and add a library path:
.. $(mca_pmix_s2_la_LIBADD) -L${HOME}/apps/pmi2/lib $(LIBS) ..

Somewhat fiddly, but works in my case. There's probably some easier way to just set the pmi2 path for all steps, but it's really just those two recipes that fail.

firsc avatar Nov 17 '23 17:11 firsc

Same issue for OpenMPI 4.1.5 and GCC 12.3.0 . It does not work If I spicify the path wit --with-slurm --with-pmi=/usr/include/slurm --with-pmi-libdir=/usr/lib64, and the buildling process complains with:

Making all in mca/pmix/s1
make[2]: Entering directory `/dev/shm/easybuild/OpenMPI/4.1.5/GCC-12.3.0/openmpi-4.1.5/opal/mca/pmix/s1'
  CC       mca_pmix_s1_la-pmix_s1_component.lo
  CC       mca_pmix_s1_la-pmix_s1.lo
pmix_s1.c:29:10: fatal error: pmi.h: No such file or directory
   29 | #include <pmi.h>
      |          ^~~~~~~
compilation terminated.
make[2]: *** [mca_pmix_s1_la-pmix_s1.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory `/dev/shm/easybuild/OpenMPI/4.1.5/GCC-12.3.0/openmpi-4.1.5/opal/mca/pmix/s1'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/dev/shm/easybuild/OpenMPI/4.1.5/GCC-12.3.0/openmpi-4.1.5/opal'
make: *** [all-recursive] Error 1
 (at easybuild/common/software/EasyBuild/4.7.0/lib/python2.7/site-packages/easybuild/tools/run.py:643 in parse_cmd_output)

However, it builds properly whith setting just: configopts = ' --with-slurm --with-pmi ' I have found this behaviour in two different servers (RHEL 7.2 and 7.7 ) both running SLURM 18.08.

qjornet avatar Mar 20 '24 11:03 qjornet