HDF5.jl icon indicating copy to clipboard operation
HDF5.jl copied to clipboard

Crash with system-provided OpenMPI and HDF5_jll v1.14

Open mfsch opened this issue 1 year ago • 6 comments

When I set up a simple project with the latest MPI and HDF5 packages and configure it to use the system-provided OpenMPI installation, the call to MPI.Init() crashes with “orte_init failed” errors. I am observing issue on both Ubuntu 18.04 (OpenMPI 3.1.2) and 20.04 (OpenMPI 4.0.3). Downgrading to HDF5_jll v1.12 fixes the issue.

Steps to reproduce:

  • create a new folder and launch Julia with julia --project=.
  • install dependencies with ]add MPI HDF5
  • run using MPI; MPI.MPIPreferences.use_system_binary()
  • attempt to run mpirun -n 4 julia --project -e "using MPI, HDF5; MPI.Init()" (or mpiexecjl), observe crash
  • downgrade with ]add [email protected], rerun without crash

On Ubuntu 18.04, the error includes the line mca_base_component_repository_open: unable to open mca_pmix_pmix3x: /home/user/.julia/artifacts/f9744710560ba3ddc00cd9df62ac7dfcd18c8649/lib/openmpi/mca_pmix_pmix3x.so: undefined symbol: opal_envar_t_class, in case this is helpful.

mfsch avatar Jun 14 '23 12:06 mfsch

ah, I've seen something similar! The problem appears to be that we're opening two different MPI libraries (the system one from MPI.jl, and the JLL one (from HDF5_jll).

Easy workarounds:

  • use a system HDF5 (see HDF5.jl docs)
  • cap HDF5_jll at 1.12 (set the compat HDF5_jll = "~1.12".

In the longer term we need a better fix. @giordano @eschnett any suggestions on how we can deal with this?

simonbyrne avatar Jun 14 '23 16:06 simonbyrne

I thought HDF5_jll.jl would use the MPI library chosen by MPIPreferences.jl

giordano avatar Jun 14 '23 16:06 giordano

Yeah, i don't quite get why it's pulling in OpenMPI_jll?

simonbyrne avatar Jun 14 '23 16:06 simonbyrne

Ah, I see.

It augments based on the value of the MPI abi: https://github.com/JuliaBinaryWrappers/HDF5_jll.jl/blob/b96de8ada558f8d70e27b5561d4f5df815b01ebf/.pkg/platform_augmentation.jl#L13

But the augmentation for abi = "openmpi" always loads OpenMPI_jll: https://github.com/JuliaBinaryWrappers/HDF5_jll.jl/blob/main/src/wrappers/x86_64-linux-gnu-libgfortran5-cxx03-mpi%2Bopenmpi.jl#L9

simonbyrne avatar Jun 14 '23 16:06 simonbyrne

My approch, of course, would be to use the Julia-provided MPItrampoline as MPI implementation, and to use the system MPI via MPItrampoline...

eschnett avatar Jun 15 '23 16:06 eschnett

Would it be possible to print a warning if a system-provided MPI installation, but no system-provided HDF5 is detected?

JoshuaLampert avatar Nov 07 '23 14:11 JoshuaLampert