mpich icon indicating copy to clipboard operation
mpich copied to clipboard

MPICH -- mpi returns myrank = 0 for all processors

Open cjoycd opened this issue 1 year ago • 14 comments

MPICH has been installed in Kubuntu 24.04 LTS as below

sudo apt-get update, sudo apt install build-essential, sudo apt install mpich libmpich-dev

After installation, with a simple test code it fails to return correct rank values for each processors (all have zero rank). Only MPICH is being installed, no other MPI such openMPI is working in parallel. How to fix the issue? Any suggestions?

cjoycd avatar Jul 14 '24 02:07 cjoycd

Sorry for the neglect. The issue is due to running mpiexec from Open MPI. The ubuntu default to use openMPI for dependency, so it is easy for them to mess this up.

hzhou avatar Aug 21 '24 04:08 hzhou

Thank you for your email.. I have fixed this problem by using a fresh installation of MPICH using source code.

Regards

On Wed, Aug 21, 2024 at 9:58 AM Hui Zhou @.***> wrote:

Sorry for the neglect. The issue is due to running mpiexec from Open MPI. The ubuntu default to use openMPI for dependency, so it is easy for them to mess this up.

— Reply to this email directly, view it on GitHub https://github.com/pmodels/mpich/issues/7064#issuecomment-2300865926, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUNJIFGRXWDDF4QX5SX3YTZSQJOXAVCNFSM6AAAAABK2ZFA7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBQHA3DKOJSGY . You are receiving this because you authored the thread.Message ID: @.***>

cjoycd avatar Aug 21 '24 04:08 cjoycd

Just tested ubuntu 24.04 using docker. Indeed it is broken. The issue is they build mpich with --with-pmix=/usr/lib/x86_64-linux-gnu/pmix2, which forces to disable hydra's launch, and need to be launched with openpmix or openmpi's launcher instead. We need reach out to the package maintainer to fix this.

hzhou avatar Aug 21 '24 04:08 hzhou

Ok, thanks a lot for the update.

On Wed, Aug 21, 2024 at 10:23 AM Hui Zhou @.***> wrote:

Just tested ubuntu 24.04 using docker. Indeed it is broken. The issue is they build mpich with --with-pmix=/usr/lib/x86_64-linux-gnu/pmix2, which forces to disable hydra's launch, and need to be launched with openpmix or openmpi's launcher instead. We need reach out to the package maintainer to fix this.

— Reply to this email directly, view it on GitHub https://github.com/pmodels/mpich/issues/7064#issuecomment-2301026290, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUNJIERD6S4HZLH757FJMTZSQMK5AVCNFSM6AAAAABK2ZFA7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBRGAZDMMRZGA . You are receiving this because you authored the thread.Message ID: @.***>

cjoycd avatar Aug 21 '24 05:08 cjoycd

FYI we were informed that the configuration issue would be fixed in the next update to the mpich package. https://bugs.launchpad.net/ubuntu/+source/mpich/+bug/2072338. I have not yet seen the update in the package repos, though.

raffenet avatar Sep 11 '24 15:09 raffenet

Awesome, thanks for the update!

On Wed, 11 Sep 2024 at 9:15 PM, Ken Raffenetti @.***> wrote:

FYI we were informed that the configuration issue would be fixed in the next update to the mpich package. https://bugs.launchpad.net/ubuntu/+source/mpich/+bug/2072338. I have not yet seen the update in the package repos, though.

— Reply to this email directly, view it on GitHub https://github.com/pmodels/mpich/issues/7064#issuecomment-2344033744, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUNJIHFWW7LILSDEEZXJLLZWBQR5AVCNFSM6AAAAABK2ZFA7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBUGAZTGNZUGQ . You are receiving this because you authored the thread.Message ID: @.***>

cjoycd avatar Sep 11 '24 15:09 cjoycd

I'm confused by this bug, that is I'm confused by mpich's intentions with respect to pmix.

The problem showed up again in the debian packages, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1102612 Pmix support had been reactivated in 4.3.0-2 (currently deactivated again in 4.3.0-6 in debian experimental).

mpich offers pmix support (--with-pmix) but the discussion here suggests that is incompatible with hydra. Should hydra be considered deprecated, and no longer used to launch mpich jobs? Or is it instead recommended to not enable pmix support in mpich?

drew-parsons avatar Apr 15 '25 19:04 drew-parsons

mpich offers pmix support (--with-pmix) but the discussion here suggests that is incompatible with hydra. Should hydra be considered deprecated, and no longer used to launch mpich jobs? Or is it instead recommended to not enable pmix support in mpich?

The default MPICH configuration is to use the Hydra process manager (mpiexec) and its own internal PMI library. Hydra is actively developed and we have no plans to deprecate.

MPICH also supports linking with external PMIx libraries and using an alternative process manager for launch. This is a requirement for certain HPC systems and also for developers who wish to utilize some features in PMIx that are not in MPICH's internal PMI library.

For distro packages, I would strongly discourage use of PMIx with MPICH. Unfortunately, we have so far not gotten traction with maintainers to set this straight. Perhaps we need to ensure that Hydra is not built when configuration with an external PMIx, and thus any resulting installation has no mpiexec.

raffenet avatar Apr 15 '25 20:04 raffenet

For distro packages, I would strongly discourage use of PMIx with MPICH. Unfortunately, we have so far not gotten traction with maintainers to set this straight. Perhaps we need to ensure that Hydra is not built when configuration with an external PMIx, and thus any resulting installation has no mpiexec.

Another possibility: if Hydra can detect an incompatible PMI client library at launch time, we can report an error to the user rather than execute a bunch of size=1 jobs. I recently saw this with another MPI library when doing some experiments.

raffenet avatar Apr 15 '25 20:04 raffenet

It could indeed help avoid confusion if the build configuration refuses to build hydra when PMIx support is requested. Package maintainers (or users) may wonder where hydra (mpiexec) disappeared to, but the incompatibility can be documented and they can make the choice which one to request.

The alternative of having hydra give a runtime error would also get the message across.

I would suggest combining the two ideas by having the build configuration give an error if both hydra and pmix support are requested, rather than silently not building hydra.

I'm curious to know, does hydra and internal PMI have any advantages over an external PMIx? Are there reasons for the distro packages to prefer keeping hydra rather than switching to pmix, apart from managing the mpiexec compatibility?

drew-parsons avatar Apr 15 '25 21:04 drew-parsons

I'm curious to know, does hydra and internal PMI have any advantages over an external PMIx? Are there reasons for the distro packages to prefer keeping hydra rather than switching to pmix, apart from managing the mpiexec compatibility?

As a project, we value the simplicity of Hydra and libpmi. They are much smaller codebases than OpenPMIx/PRRTE, and that allows us to debug and add new features quickly without being bound to others' timelines. New features are probably the main advantage that in the end would be user-visible.

raffenet avatar Apr 16 '25 18:04 raffenet

That makes sense. Hydra continues to have strategic value.

drew-parsons avatar Apr 16 '25 21:04 drew-parsons

There's a lot of value in working out of the box. The present situation is really disruptive and does not have a simple/discoverable workaround. I think the distro version should use hydra.

It'll be worth having a design discussion about MPI in Debian generally once MPI-5 ABI is available from both MPICH and Open MPI. That should allow unifying what are now (ABI-incompatible) foo-openmpi and foo-mpich into one foo-mpi that can be launched using either mpiexec.mpich or mpiexec.openmpi (names TBD).

jedbrown avatar Apr 22 '25 05:04 jedbrown

A common ABI will help. That will bring MPI support in line with BLAS support, where different BLAS implementations are interchangeable.

I'd like to flag another challenge we have to face, which is GPU support in MPI. mpich 4.3 added GPU support. We tried activating it in the Debian package 4.3.0-3 but it was disruptive so we had to deactivate it. It was giving an error MPII_init_gpu(51)....: gpu_init failed when run under "normal" conditions with no gpu available.

drew-parsons avatar Apr 22 '25 07:04 drew-parsons