ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Extract the install prefix from the shared library.

Open bosilca opened this issue 9 months ago • 10 comments

This is part of a multi-project effort, a similar PR will be created in OpenPMIX and OMPI. The goal of each of these changes is the same: instead of using build-time generated prefix that ignore a project rebase, take the prefix from the shared library of each project and derive the necessary paths from it. The user can however overwrite this using the environment variables, and the configuration files.

Open PMIX PR PRRTE PR

To test it: build OMPI with a prefix, then move the install directory some other place. Set your $PATH and $LD_LIBRARY_PATH (or $DYLD_LIBRARY_PATH on OSX) to correctly point to the new locations then run mpirun -np 1 nothing (yes run a non existing application). If everything works fine you should see a nice error message, like this:

mpirun -np 1 nothing
--------------------------------------------------------------------------
prterun was unable to find the specified executable file, and therefore did
not launch the job.  This error was first reported for process rank
0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a prterun command
   line parameter option (remember that prterun interprets the first
   unrecognized command line token as the executable).

   Node:       XXX
   Executable: nothing
--------------------------------------------------------------------------

If it doesn't, then you will get something like:

--------------------------------------------------------------------------
Sorry!  You were supposed to get help about:
    no-plugins
But I couldn't open the help file:
   .../share/prte/help-pmix-runtime.txt: No such file or directory.  Sorry!
--------------------------------------------------------------------------

bosilca avatar Mar 17 '25 02:03 bosilca

I'm wondering if this is actually turning out to be all that useful. I have to set LD_LIBRARY_PATH anyway, which means we still require the user to provide prefix values for PRRTE and PMIx (for PRRTE) and for app's PMIx (along with a flag to tell us not to apply any prefix). All this does is let me not include the prefix itself in the environment - but that's trivial and not worth the bother.

The goal was to avoid having the user provide us prefix values, but I don't see any way out of it. Do you?

rhc54 avatar Mar 17 '25 16:03 rhc54

Correct, users always have to set PATH and LD_LIBRARY_PATH, because otherwise mpirun will not start due to missing shared libraries. But, that all they have to set, because now we can extract the prefix from the shared library and infer the prefix.

bosilca avatar Mar 17 '25 17:03 bosilca

Correct, users always have to set PATH and LD_LIBRARY_PATH, because otherwise mpirun will not start due to missing shared libraries. But, that all they have to set, because now we can extract the prefix from the shared library and infer the prefix.

I'm afraid that isn't quite correct - you forget that the reason for prefix is that the libs on the remote nodes are in a different location. So you cannot set PATH and LD_LIBRARY_PATH for those locations where mpirun is located or mpirun will fail to start. Instead, you have to pass the prefix - and then I set the paths for the backend.

So this really doesn't do anything for us that I can see - we still need the user to specify the prefix values so we can locate the backend libs.

rhc54 avatar Mar 17 '25 18:03 rhc54

Just a heads-up: I added this to PRRTE and it broke the show-help subsystem - couldn't find the help files. This was on a Mac with no special configure options and no envar setting.

rhc54 avatar Mar 18 '25 01:03 rhc54

I'm not sure I understand your comment regarding PATH and LD_LIBRARY_PATH (especially the part regarding mpirun on the remote nodes). Anyway, in any case if you set the PREFIX it will take precedence and things will go back to the way they were before.

What this does is that if I move the install directory and update the local PATH and LD_LIBRARY_PATH, I don't have to set the prefix to get things working properly.

How did it broke the help subsystem ? How can I reproduce ?

bosilca avatar Mar 18 '25 14:03 bosilca

So here is how the system works. If the user has installed PRRTE/PMIx in a different location on the remote nodes, then we have to point the PATH and LD_LIBRARY_PATH at those locations when we start the daemons. This typically happens when the mount point for a shared file system is different on the login vs compute nodes - a not uncommon situation. The user cannot update PATH/LD_LIBRARY_PATH locally as that would break mpirun itself (you wouldn't find the executable or libprrte/libpmix), which is one reason why we created the prefix capability.

So when you give us a prefix (cmd line or envar), we overwrite the local PATH and LD_LIBRARY_PATH with the prefixed version prior to calling ssh or srun or whatever. Thus, it is not the local value of those variables that gets to the backend - it is the prefixed value. We also insert the prefix into the environment.

Point being: if someone has moved the install, or has put the install into a different location on the backend (which to us looks just like a relocation), then they have to give us a prefix so we know how to set the PATH (to find the prted) and the LD_LIBRARY_PATH (to load the base library - e.g., libprrte and libpmix for PRRTE, libopen-pal for OMPI). Otherwise the proc won't even start, and you'll never get to the installdirs code.

So if they have to set the prefix anyway, it isn't clear to me that this code actually does anything for us. The point was to try and avoid having the user provide us with a prefix, but I don't see a way to do it.

I wouldn't worry about the show-help problem until we determine this is actually worth doing. All I did was commit https://github.com/openpmix/prrte/pull/2167 and do a vanilla install, then prterun --help. I've reverted it and things are back to normal, so no worries there.

rhc54 avatar Mar 18 '25 14:03 rhc54

I think you have misunderstood the scope of this PR. It was never to get rid of the PREFIX, it was to not require the 3 PREFIXes in some very precise cases.

When the software was relocated but is mounted in the same location across the entire cluster. The user has updated his PATH and LD_LIBRARY_PATH before mpirun (maybe even a module load) and the environment is propagated by the batch scheduler.

Until few days ago things would have started remotely, but access to the help files would not have worked, not even when the user was setting their own PREFIX. With the fix you put, things will now work nicely and the user will get all the error messages correctly for as long as they set 3 prefixes, OPAL, PRTE and PMIX.

With this PR, in the same conditions the app will run correctly, including the right modules and help files, even if they don't set three prefixes, because we will extract the prefix from the corresponding shared library. Of course the shared library would have to be loaded, that was the premise of the correctness of the LD_LIBRARY_PATH across the entire setup.

bosilca avatar Mar 18 '25 20:03 bosilca

Errr...if that's the case, then I think we all misunderstood things. We don't require 3 prefixes in that very limited scenario you describe. It works just fine with only one. If the user sets OPAL_PREFIX, your mpirun wrapper automatically sets PRTE_PREFIX and PMIX_PREFIX to that same value. The change I made (without any other changes) will propagate those values correctly and everything works fine.

The exceptions to this are many. For example, the above only works for a monolith - i.e., it won't work if the user is working with an external PRRTE or PMIx. It also doesn't work for those systems where backend locations are different from frontend locations (e.g., login vs compute nodes). Or for containers where the PMIx used by the app is different from the one used by PRRTE. Etc.

But if the one prefix goal for the homogeneous location case is what we are striving to achieve here - we already have that solution courtesy of the general solution. Given we require the general solution anyway, it isn't clear to me what problem this addition solves.

rhc54 avatar Mar 18 '25 20:03 rhc54

This entire discussion started because the prefixes were not propagated (aka they were actually manually removed), and now you come and claim we all misunderstood and that we dont require 3 prefixes.

Anyway, it didn't work when all this discussion started and it doesn't work in the latest stable either. But let's hope it works today, maybe once all the branches get synched and pulled into the stable (or main) version.

bosilca avatar Mar 19 '25 21:03 bosilca

It won't work in your main branch because you are still pointed at the PRRTE fork, which is broken. It is working just fine right now in your v5.0 branch because we updated PRRTE there, and it works fine in the upstream PRRTE master branch.

My point was that OMPI, when operating as a monolith, only needs the OPAL_PREFIX because you propagate that value into PRTE_PREFIX and PMIX_PREFIX in the mpirun wrapper - outside the view of the user. Internally, we still use three prefixes to support the three layers.

The monolith solution is a limited special case. Your current code breaks anyone using external PMIx or PRRTE because it incorrectly sets prefix values for those packages. I'm not sure how we could adjust the wrapper, though, to handle those cases. Might just need to tell people not to use that wrapper unless they are working with the monolith case. Or maybe adjust the wrapper so it doesn't overwrite PRTE_PREFIX and/or PMIX_PREFIX if the user already set those in their environment?

rhc54 avatar Mar 19 '25 21:03 rhc54