setup-mpi icon indicating copy to clipboard operation
setup-mpi copied to clipboard

Changes in v1.3.1 cause inconsistent MPI errors.

Open JuanPedroGHM opened this issue 8 months ago • 1 comments

Hi,

for the heat library, we use both mpi4py and the setup-mpi action. Firstly, thanks for the great work!

We use the setup-mpi to run our tests on GitHub, but we have not been able to update the action to v1.3.1 because our tests fail when trying to update, as you see in this PR. As far as we can tell, there is no change in the MPI version between the runs using [email protected] and v1.3.1. Both use OpenMPI 1.4.6, and the output from ompi_info --all is exactly the same as far as we can tell.

We have not been able to recreate the errors on our systems, installing the same MPI version and the other dependencies and running our tests, so we have not been able to properly debug the errors.

Here are links to the pipelines running on 1.3.1, and one running on 1.2.0.

Failing CI with 1.3.1

Working CI with 1.2.0

We would really appreciate your input into solving this issue. Let us know if you need any further information.

Best, Juan

JuanPedroGHM avatar Apr 14 '25 08:04 JuanPedroGHM

I have no clue what's going on. This does not seem to be related to the mpi4py/setup-mpi action, but rather one of these nasty bugs in the Open MPI v4.x leading to non-reproducible failures.

The only suggestion I have is to add env: {OMPI_MCA_pml=ob1} to your test step, or add a previous step with run: echo OMPI_MCA_pml=ob1 >> "$GITHUB_ENV". That's what I usually do for my own CI runs, for example here.

dalcinl avatar Apr 14 '25 14:04 dalcinl

@JuanPedroGHM Any update? Can we close this issue?

dalcinl avatar Nov 27 '25 17:11 dalcinl