Integrate `myi4py-mpich` into `mpi4py` or other options?
I'm a contributor of the mpi4py-mpich. Its a fork of mpi4py that include binaries of MPICH so that users don't need to install MPICH as a system dependency. We have found that it can be a convenient option for users who may have some download restrictions or confusion of what MPI is. Since Pip isn't like Conda and doesn't have system packages, we went with this approach.
I was planning to update mpi4py-mpich to use the latest version of mpi4py (3.1.5). Before I do so, I wanted to ask: would you be interested in taking ownership of this project, or providing some option to install or package mpi4py with mpich?
I saw on https://github.com/mpi4py/mpi4py-publish that Intel MPI can now be installed via a Python package like python -m pip install impi-rt. I'd imagine something similar could work, either integrated or separate from mpi4py. While we mostly use MPICH variants, having an OpenMPI build would be useful as well?
If this is not viable, what do you think the path forward should be?
I'm a contributor of the mpi4py-mpich.
I've already raised my concerns in https://github.com/mpi4py/mpi4py/issues/260#issuecomment-1260710149 about the approach followed by mpi4py-mpich. On the one side, you have the maintenance issue: the package has not been kept up to date with upstream mpi4py releases. But even more importantly, the approach is IMHO totally wrong. Looks like mpi4py-mpich uses auditbuild to make a manylinux package, therefore the MPICH library is EMBEDDED within the build. This embedding prevents the package from being used with an EXTERNAL, pre-installed MPICH library as you would usually find in a super-computing facility. In my opinion, this is unacceptable. The mpi4py-mpich package took the quick and cheap approach to solve the binary distribution problem. This may have been an appropriate solution for those who published the package, but it is by far not the proper solution for general use cases.
I saw on https://github.com/mpi4py/mpi4py-publish
And that's my "solution" for the binary distribution issue. It was a lot of work. Much, much work than just creating a manylinux build with auditwheel to embed the MPICH library. And yet, as far as I know, this new stuff I did has not been used in production too much, so I'm hesitant of making these binaries "default" and publishing them to PyPI. A remaining problem is that these packages ultimately cover only MPICH and Open MPI derivatives. Perhaps this is good enough. Moreover, the upcoming MPI standard will feature a new MPI ABI that would allow a mpi4py binary to work with any and all MPI implementations.
Intel MPI can now be installed via a Python package like
python -m pip install impi-rt. I'd imagine something similar could work, either integrated or separate from mpi4py
That's because Intel MPI folks did the work of making the installation relocatable, so the MPI binaries can all be installed in any prefix location. I guess something similar could be done with MPICH and also Open MPI, such that we could do pip install mpich. Actually, in the MPICH case, maybe a couple packages are needed: mpich-ofi and mpich-ucx, depending on the backend device. And yet, there is the minor detail of setting things up such that the MPICH library can use an external libfabric or UCX. In any case, as you see, all this work can be done outside of mpi4py.
If this is not viable, what do you think the path forward should be?
IMHO, the path forward would be to exhaustively test in production the binaries created via https://github.com/mpi4py/mpi4py-publish and published in https://anaconda.org/mpi4py/mpi4py . This infrastructure I've built currently allows for Open MPI and MPICH (and vendor variants), but always assuming that MPI has to be externally installed.
After that, we need to have a community discussion to weigh in whether we can really commit ourselves to put these wheels into PyPI for others to consume. Maybe we could use an alternative PyPI package name, let say mpi4py-mpiabi.
Finally, and somewhat orthogonal to mpi4py, is the publishing in PyPI of MPICH and Open MPI packages that are pip-installable, pretty much as Intel MPI is on Linux and Windows.
cc @leofang
On the one side, you have the maintenance issue: the package has not been kept up to date with upstream mpi4py releases.
Yes, we don't have enough resources to always update it with mpi4py upstream, particularly recently. I'm also concerned with the potential upcoming release of mpi4py 4 which may break the embedding process. That's why I'm trying to open up the conversation.
Looks like mpi4py-mpich uses auditbuild to make a manylinux package, therefore the MPICH library is EMBEDDED within the build. This embedding prevents the package from being used with an EXTERNAL, pre-installed MPICH library as you would usually find in a super-computing facility
Yes, that was the intention. Sure, it may not be the best solution, but it is the most user-friendly for downstream dependencies of this project. As a general Python developer, if you install mpi4py, it's very likely you have some understanding of what MPI is and that you'll need a system dependency. But if you install a package that depends on another package that depends on MPI, you may not read the line in the docs that tells you to install a system MPI. Or you're not experienced enough to set up a system MPI installation.
This may be the quick and cheap approach, but for us, it's better than the alternative of not using it.
That's because Intel MPI folks did the work of making the installation relocatable, so the MPI binaries can all be installed in any prefix location.
And that's good that they did. If only they supported macOS, then it would that be a potential solution for us (obviously they wouldn't, just surmising).
IMHO, the path forward would be to exhaustively test in production the binaries created via https://github.com/mpi4py/mpi4py-publish and published in https://anaconda.org/mpi4py/mpi4py . This infrastructure I've built currently allows for Open MPI and MPICH (and vendor variants), but always assuming that MPI has to be externally installed.
And that is what we primarily use and recommend other to use. Not because binaries are available on Conda vs source distributions on PyPi, but because there are packages for Open MPI and MPICH. And if we want to use a system distribution of MPI, we can use the external build variants for compatibility.
I understand the concerns you have with mpi4py-mpich, and share some of them. But I don't think you understand or appreciate the problem this package is attempting to solve. It does seem like making a mpich-rt package is the preferred approach. But I don't think either mpi4py or the MPI for Python community is interested in helping make that available. For the time being, I am going to update mpi4py-mpich, and maybe look into what it would take to create a wheel distribution of MPICH, but I can't guarantee anything beyond that.
But I don't think you understand or appreciate the problem this package is attempting to solve.
After countless hours of work to setup all the stuff in https://github.com/mpi4py/mpi4py-publish, your statement is realy heartbreaking. I do understand the problem, I do understand the pain, so I'm taking steps to the issue. However, In the process of improving things, I WILL NOT compromise doing things the right way for the sake of making things trivial to novices.
It does seem like making a
mpich-rtpackage is the preferred approach.
The -rt prefix is not really needed. A full mpich package with header and libraries would be totally fine. No need to split in -devel and -rt as Intel did.
But I don't think either
mpi4pyor the MPI for Python community is interested in helping make that available.
Why do you think that? The only problem is resources: I don't have infinite time to work on all aspects of the problem. FWIW, I do contribute to the MPICH project as well, and I believe MPICH developers will be sympathetic to any change we may propose to allow MPICH installs binary-relocatable.
For the time being, I am going to update
mpi4py-mpich,
Are you doing this work in some GitHub repository? Generating wheels with cibuildwheel under GitHub Actions is trivial, therefore you could automate the whole thing. Moreover, you could grab stuff from what I did in the mpi4py-publish repository.
Additionally, you would still embed MPICH, but in a different way, such that users would be able to OVERRIDE the location of the MPICH library, either via LD_LIBRARY_PATH or LD_PRELOAD. To this end, you only need to "patch" the auditwheel tool to not rename libmpi.so.12 but still ship it in the wheel file.
In other words, my main objection to the embedding of the MPICH library is that it is done in a way hinders the possibility of runtime-replacing it with an external MPICH installation.
and maybe look into what it would take to create a wheel distribution of MPICH, but I can't guarantee anything beyond that.
This would be the definitive solution, indeed. An Open MPI package would also be welcome I guess.
It took a while, but I managed to build an MPICH package the proper way: https://anaconda.org/mpi4py/mpich/files There are a few caveats, though:
- No Fortran binding support, the
mpifortcompiler wrapper IS NOT available - No C++ bindings support, the
mpicxxcompiler wrapper IS available for C++ codes using C bindings of MPI. - Only ch4:ofi with libfabric code embedded in the MPI library (see pmodels/mpich#6905).
Infrastructure is mostly in place to eventually publish mpich-ofi and mpich-ucx packages that would allow overriding the OFI/UCX shared libraries with an external installation. However, at that point, I would argue it would be best to just override the whole MPICH with an external installation instead of installing mpich with pip.
To test mpi4py+MPICH, run the following on either Linux or macOS:
python -m venv /tmp/venv
source /tmp/venv/bin/activate
python -m pip install mpi4py mpich \
-i https://pypi.anaconda.org/mpi4py/simple
command -v mpiexec
command -v python
python -m mpi4py --prefix
python -m mpi4py --version
python -m mpi4py --mpi-std-version
python -m mpi4py --mpi-lib-version
mpiexec -n 5 python -m mpi4py.bench helloworld
And now we have Open MPI https://anaconda.org/mpi4py/openmpi/files
To test mpi4py+Open MPI, run the following on either Linux or macOS:
python -m venv /tmp/venv
source /tmp/venv/bin/activate
python -m pip install mpi4py openmpi \
-i https://pypi.anaconda.org/mpi4py/simple
command -v mpiexec
command -v python
python -m mpi4py --prefix
python -m mpi4py --version
python -m mpi4py --mpi-std-version
python -m mpi4py --mpi-lib-version
mpiexec -n 5 python -m mpi4py.bench helloworld
@srilman We have not heard back from you. Did you have a chance to try all the stuff I've put together after a lot of work?
@srilman We were advised to file a PyPI ticket and take over the mpi4py-mpich project name on PyPI according to PEP 541, as it's causing confusion to users and we are unable to offer any support for this package as-is for reasons discussed above. Also note that we're adding fair play rules with regard to the mpi4py project name (https://github.com/mpi4py/mpi4py/pull/512).
However, we'd still like to work with you and want to give you a heads-up before filing a ticket. Please kindly respond by this Friday. Thank you.
@leofang Thanks for the heads up, I've not had a chance to look into this since the last time. I understand the fair play rule to disallow other packages from using the mpi4py name. But it looks like there is a fair play rule that denies others from publishing their own copy of the mpi4py binaries on PyPi. Would that apply in this case? So even if we renamed mpi4py-mpich to something not referring to mpi4py on our end, it would still break the rules?
So even if we renamed mpi4py-mpich to something not referring to mpi4py on our end, it would still break the rules?
Yes. In its current form a mpi4py-mpich user still does import mpi4py at run time, right? So it's not helping even if the package is renamed. You can see how bad it is -- users can pip-install both mpi4py and mpi4py-mpich (or any name X) side-by-side, and both installations would corrupt each other depending on the installation order. pip simply has no mechanism to guard against re-publishment of a project.
If the package is renamed and the module is placed under another module namespace, say, from xxxxx import mpi4py, this would be an acceptable form as per Rule No.1.
@srilman friendly nudge 🙂 I see two paths forward without involving PyPI admins:
- You start testing out @dalcinl's solution (https://github.com/mpi4py/mpi4py/issues/463#issuecomment-1940458020) and work with us to improve the
mpi4py-mpichoffering. Note that Lisandro's main concern is (https://github.com/mpi4py/mpi4py/issues/463#issuecomment-1920770882):
In other words, my main objection to the embedding of the MPICH library is that it is done in a way hinders the possibility of runtime-replacing it with an external MPICH installation.
- You kindly transfer the ownership to us (Lisandro and I), and let us take from here (we'll be the final decision maker going forward). I just realized this was actually what you originally planned too (https://github.com/mpi4py/mpi4py/issues/463#issue-2108626335):
Before I do so, I wanted to ask: would you be interested in taking ownership of this project, or providing some option to install or package mpi4py with mpich?
so all we're asking is to revisit your original plan. Thank you.
@leofang I'm not sure if we'll have enough time soon to test @dalcinl's solution, so we're leaning towards transferring ownership to you two. But if you don't mind, could you keep us updated, preferably in this thread, if you choose to significantly change or delete mpi4py-mpich? That way, we can track if there are any downstream libraries that need to be updated. Thanks in advance!