ompi
ompi copied to clipboard
Singleton init mode, MPI_Comm_spawn, and OPAL_PREFIX
I'm working on producing a binary Open MPI Python wheel to allow for pip install openmpi. I had to add a few hacks here and there to overcome a few issues strictly related to how wheel files are stored, how they are installed in Python virtual environments, and the lack of post-install hooks, and the expectations of things to work without activating the environment . All this of course requires relocating the Open MPI installation. And at that point I've found and minor issue that I don't know how to overcome.
First, a clarification: I'm using internal PMIX and PRRTE. Supposedly, OPAL_PREFIX env var is all what is needed for things to work out of the box when relocating an Open MPI install. However, I think I came across a corner case. If using singleton init mode, then I believe OPAL_PREFIX is simply ignored, and if tools are not located via $PATH, then things do not work.
Looking at the spawn code, I see a function start_dvm in ompi/dpm/dpm.c.
This function start_dvm has the following code:
/* find the prte binary using the install_dirs support - this also
* checks to ensure that we can see this executable and it *is* executable by us
*/
cmd = opal_find_absolute_path("prte");
However, I believe opal_find_absolute_path() does not care at all about the OPAL_PREFIX env var, it only uses PATH, eventually. The comment find the prte binary using the install_dirs support is simply not true.
@hppritcha Your input is much appreciated here. I do have a reproducer, but so far it is based on Python and intermediate binary assets I'm generating locally. I hope the description above is enough for you to realize the issue.
I'll take a look at adjusting the path search. Having enough problems with #12307 at the moment.
I'll take a look at adjusting the path search. Having enough problems with #12307 at the moment.
Sorry! :cry:
@dalcinl Just to be clear is this a problem on main branch as well?
@wenduwan I discovered this issue testing with the 5.0.2 tarball. I cannot assert anything about the main branch, but a quick glance at the code seems to indicate that the root issue is also in there.
Adding main label so we don't ignore it for e.g. 5.1.x
@jsquyres Would Open MPI accept platform-specific ways to derive OPAL_PREFIX automatically, alleviating users from the need to set it explicitly?
All that is needed is to translate OPAL_PREFIX to PRTE_PREFIX in mpirun - but you can only do that if you are using the internal PRRTE. Same is true for PMIX_PREFIX.
If this is about singleton comm-spawn, then you need to put the translation into the bottom of the dpm.c file before we fork/exec prte. Again, with external PRRTE/PMIx protected.
@rhc54 Arent you are oversimplifying things a bit? opal_find_absolute_path("prte") will keep ignoring any XXX_PREFIX variable you may set, as it only looks in PATH.
I know almost nothing about the ompi codebase, but it looks to me that you have to, after the environment var translation you described, somehow replicate the logics of find_prterun() in ompi/tools/mpirun/main.c [link]. I may be saying nonsense, in that case just ignore me.
I wasn't oversimplifying things - just pointing out that this isn't a difficult problem. Sure, you need to update the dpm code to duplicate that in mpirun, but that's a copy/paste operation. The code in mpirun is actually doing the "right thing" - it's just the dpm that is out of date.
Try #12363 - took longer to compile then to do 🤷♂️
This issue is losing steam. @dalcinl I wonder if the issue is covered by mpi4py CI? The ompi mpi4py looks green now. Is this still a problem?
@wenduwan I believe this issue is already fixed in main, at least in my local testing. I'm still getting some warnings, though, but these are unrelated:
[kw61149:2141245] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.kw61149.1000/jf.0/1892286464/shared_mem_cuda_pool.kw61149 could be created.
[kw61149:2141245] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728
And no, the mpi4py CI tests do not specifically test for this issue. After all the work @jsquyres put into it, adding a new workflow should be quite easy if you really want to have a quick test for relocation via OPAL_PREFIX+LD_LIBRARY_PATH.
The warnings are real. @edgargabriel and @lrbison both observed them in singleton mode.
I think adding a minimal relocation test is reasonable. IIUC the test will look like the following:
- Install Open MPI as normal
- Install mpi4py as normal
- Move Open MPI installation to another location, e.g.
mv /original/install /new/install export OPAL_PREFIX=/new/install/...export LD_LIBRARY_PATH=/new/install/lib:$LD_LIBRARY_PATH- Run tests with
/new/install/bin/mpirun ... -x LD_LIBRARY_PATH -x OPAL_PREFIX
Please correct me if I'm wrong.
Not sure I understand this report. Are you saying running mpirun generates a warning? That isn't a singleton issue, so what precisely are you saying is wrong?
no, its the other way around. With mpirun there is no warning, only with singleton
Thanks - and the warning is still there on head of current main?
I think so, I think I saw it a few days ago. I can double check in a few days and look into it, its coming from the btl.smcuda component I think
You see this smcuda warning any time the MPI process doesn't have a prrte pmix server to talk to. So singleton and direct launched with srun (and likely palsrun) also give the warning, one instance per node of the job as I recall. you don't see this with btl/sm because it has a fallback if there doesn't appear to be a pmix server created session dir.
Yeah, this has nothing to do with moving the installation. I had posted a patch to silence this warning, but it ran into a bunch of nit-picking and so I deleted it. It's a trivial fix to silence it, if someone else wants to try to get it committed.
@wenduwan #12526
@dalcinl Thank you very much!