SOS icon indicating copy to clipboard operation
SOS copied to clipboard

Cannot build `--with-pmix` on Fedora 36 with distro pmix package

Open dalcinl opened this issue 3 years ago • 5 comments
trafficstars

I'm trying to configure --with-pmix on Fedora 36 using the distro pmix[-devel] RPM package

$ rpm -ql pmix-devel
/usr/include/pmix.h
/usr/include/pmix_common.h
/usr/include/pmix_deprecated.h
/usr/include/pmix_extend.h
/usr/include/pmix_server.h
/usr/include/pmix_tool.h
/usr/include/pmix_version.h
/usr/lib64/libpmix.so
/usr/lib64/pkgconfig/pmix.pc
/usr/share/pmix/pmix-valgrind.supp

However, it does not work:

./configure --with-pmix ...
...
configure: error: No PMI client interface was configured, consider --enable-pmi-simple or --with-pmi

Looking at the generated shell code in configure, looks like libmpix.* is searched on /usr/lib/, but not in /usr/lib64/, and then PMIx is not found.

dalcinl avatar Nov 08 '22 12:11 dalcinl

After manually search and replace $pmix_ext_install_dir/lib->lib64 in configure, I had it working, but only if I explicitly pass --with-pmi=external. A bare --with-pmix did not make it.

dalcinl avatar Nov 08 '22 12:11 dalcinl

Thanks for the report. The SOS m4 that picks up pmix comes from upstream (OpenMPI), and it looks like there are relatively recent changes to those files. We can try incorporating these changes to see if this problem has been fixed. If not, we'll create an issue upstream.

davidozog avatar Nov 09 '22 17:11 davidozog

@davidozog there are some issues with SOS using the OpenMPI pmix checking macros. The main issue is that pmix is required in OpenMPI so you cannot give the parameter --without-pmix even though SOS supports many different process managers. This leads to a weird syntax of external vs internal pmix providers. For SOS it would always be external so only specifiying --with-pmix without external or directory can't work. The other issue is that the pmix checker is hardcoded to scan for libraries in the lib directory. In Fedora the libraries are located in lib64. Preferably this should be modernized to just use pkg-config if available.

I propose that --with-pmix should behave in the following ways:

  1. internal, external, and unspecified should behave the same
  2. pmix checker should check pkg-config by default. The specification of DIR does not make sense because the headers and libraries locations are very different between Debian and RedHat based distributions and the old /usr/lib /usr/include structure cannot be assumed anymore. We could keep the old assumption in the codebase for backwards compatibility.

markbrown314 avatar Feb 25 '25 17:02 markbrown314

As a side note, I think the --without-pmix issue is highly related to this #1071 issue... but technically different?

As a possibly simpler alternative to that problem... could we just add a switch in SOS to enable/disable PMIx, then still leverage the upstream PMIx support? Here's a sketch if that's more clear:

diff --git a/configure.ac b/configure.ac
index b92aad4f..600ad749 100644
--- a/configure.ac
+++ b/configure.ac
@@ -48,8 +48,6 @@ AS_IF([test "$opal_cv___attribute__deprecated" = 1],
       [SHMEM_AC_HAVE_ATTRIBUTE_DEPRECATED=0])
 AC_SUBST(SHMEM_AC_HAVE_ATTRIBUTE_DEPRECATED)

-OPAL_CHECK_PMIX
-
+# Add option to disable PMIx detection in SOS because OPAL configury requires PMIx:
+AC_ARG_ENABLE([pmix],
+    [AS_HELP_STRING([--disable-pmix], [Disable PMIx detection (default: enabled)])],
+)
+
+if test "$enable_pmix" != "no" ; then
+    OPAL_CHECK_PMIX
+fi
+

However, I think I see the separate lib64 problem you mention... the help string says lib64 is supported but it's not? I can't test/verify it at the moment. At a glance, I don't think any recent upstream updates have fixed it... but I'm unsure.

@markbrown314 - are you saying you've tested it and lib64 still doesn't work, even after PR #1130?

Does @dalcinl still have eyes on this? Is it easy to verify on your end?

I'm curious if supporting PMIx is a requirement at all (hard/soft/none?) for SOS... It must be soft is it's now marked low-priroity?

davidozog avatar Feb 25 '25 18:02 davidozog

@davidozog yes I tried this on a Fedora 41 OS machine and it fails as specified in this ticket. I tested it on the tip of the SOS tree so it includes your latest configury updates. The current Open MPI codebase has an opal_config_pmix.m4 that does use pkg-config which is quite different than the opal_config_pmi.m4 OPAL_CHECK_PMIX function. However, I think the Open MPI opal_config_pmix.m4 is overkill since it has that internal/external pmix logic and assumes pmix is a requirement.

markbrown314 avatar Feb 26 '25 01:02 markbrown314