Enzyme.jl
Enzyme.jl copied to clipboard
Crash with `MPI.Reduce!`
The following code crashes with
call: %23 = call i32 @PMPI_Reduce(i64 noundef -1, i64 %19, i32 %12, i32 %20, i32 %21, i32 noundef 0, i32 %22) #10 [ "jl_roots"({} addrspace(10)* addrspacecast ({}* inttoptr (i64 140562307388096 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140562307248336 to {}*) to {} addrspace(10)*), {} addrspace(10)* %0) ], !dbg !65
unhandled mpi_allreduce op: %21 = load i32, i32 addrspace(11)* addrspacecast (i32* inttoptr (i64 140562307248336 to i32*) to i32 addrspace(11)*), align 16, !dbg !68, !tbaa !21
The complete log is attached.
using MPI
using Enzyme
function foo(x::Vector{Float64})
MPI.Reduce!(x, MPI.SUM, 0, MPI.COMM_WORLD)
return nothing
end
MPI.Init()
x = ones(10)
foo(x)
x = ones(10)
dx = zeros(10)
autodiff(foo, Duplicated(x, dx))
MPI.Finalize()
What variant of MPI are you using?
cc @vchuravy
The artifact one below. I also tried a system MPICH and OpenMPI. It should not work with any MPI library. Let me know if it does.
(jlScratch) pkg> st
Status `/scratch/mschanen/git/jlScratch/Project.toml`
[052768ef] CUDA v3.12.0
[7da242da] Enzyme v0.10.4 `/scratch/mschanen/julia_depot/dev/Enzyme`
[da04e1cc] MPI v0.19.2
[91a5bcdd] Plots v1.31.4
[de0858da] Printf
[10745b16] Statistics
julia> Pkg.build("MPI"; verbose=true)
Building MPI → `/scratch/mschanen/julia_depot/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/d56a80d8cf8b9dc3050116346b3d83432b1912c0/build.log`
[ Info: using system MPI ] 0/1
┌ Info: Using implementation
│ libmpi = "libmpi"
│ mpiexec_cmd = `mpiexec`
└ MPI_LIBRARY_VERSION_STRING = "MPICH Version:\t3.3a2\nMPICH Release date:\tSun Nov 13 09:12:11 MST 2016\nMPICH Device:\tch3:nemesis\nMPICH configure:\t--build=x86_64-linux-gnu --prefix=/usr --includedir=\${prefix}/include --mandir=\${prefix}/share/man --infodir=\${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=\${prefix}/lib/x86_64-linux-gnu --libexecdir=\${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --with-libfabric --enable-shared --prefix=/usr --enable-fortran=all --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/mpich --docdir=/usr/share/doc/mpich --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr CPPFLAGS= CFLAGS= CXXFLAGS= FFLAGS= FCFLAGS=\nMPICH CC:\tgcc -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2\nMPICH CXX:\tg++ -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2\nMPICH F77:\tgfortran -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2\nMPICH FC:\tgfortran -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2\n"
┌ Info: MPI implementation detected
│ impl = MPICH::MPIImpl = 1
│ version = v"3.3.0-a2"
└ abi = "MPICH"
Oh sorry, I did not notice it tries to use the system one even if the environment variable JULIA_MPI_BINARY
is not set. Anyhow, tried with JULIA_MPI_BINARY=""
which falls back to MPI_jll:
julia> Pkg.build("MPI"; verbose=true)
Building MPI → `/scratch/mschanen/julia_depot/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/d56a80d8cf8b9dc3050116346b3d83432b1912c0/build.log`
[ Info: using default MPI jll ] 0/1
It gives the same error.