Cannot `using Trixi` on a cluster (RAMSES)
I'm on a node of the UoC's cluster RAMSES, and I get this:
julia> using Trixi
slurmstepd: error: mpi/pmi2: invalid pmi1 command received: 'init'
It just freezes there.
This is coming from init_mpi() in __init__().
Trixi.jl is not usable on this system, as it doesn't get past the initialization.
For some reason, it now works on the same machine. Closing for now.
Update: It's back.
This sounds like something you need to discuss with your cluster admin?
Does this reproduce solely with MPI.Init()? Please post MPI.versioninfo()
You're right:
julia> using MPI
julia> MPI.versioninfo()
MPIPreferences:
binary: MPICH_jll
abi: MPICH
Package versions
MPI.jl: 0.20.22
MPIPreferences.jl: 0.1.11
MPICH_jll: 4.3.0+1
Library information:
libmpi: /scratch/efaulha2/.julia/artifacts/05d8c79b270470018e9de8dd24ddb6d7954aff9d/lib/libmpi.so
libmpi dlpath: /scratch/efaulha2/.julia/artifacts/05d8c79b270470018e9de8dd24ddb6d7954aff9d/lib/libmpi.so
MPI version: 4.1.0
Library version:
MPICH Version: 4.3.0
MPICH Release date: Mon Feb 3 09:09:47 AM CST 2025
MPICH ABI: 17:0:5
MPICH Device: ch3:nemesis
MPICH configure: --build=x86_64-linux-musl --disable-dependency-tracking --disable-doc --enable-fast=ndebug,O3 --enable-static=no --host=x86_64-linux-gnu --prefix=/workspace/destdir --with-device=ch3 --with-hwloc=/workspace/destdir
MPICH CC: cc -DNDEBUG -DNVALGRIND -O3
MPICH CXX: c++ -DNDEBUG -DNVALGRIND -O3
MPICH F77: gfortran -O3
MPICH FC: gfortran -O3
MPICH features:
julia> MPI.Init()
slurmstepd: error: mpi/pmi2: invalid pmi1 command received: 'init'
It is likely that slurm is doing some shenanigans and notices that you are trying to use a non-slurm MPI. E.g. MPICH_jll using the pmi1 protocol instead of the pmi2 protocol to initialize computations.
I would recommend using the system MPI directly. https://juliaparallel.org/MPI.jl/latest/configuration/#using_system_mpi