Enzyme.jl icon indicating copy to clipboard operation
Enzyme.jl copied to clipboard

Crash with `MPI.Reduce!`

Open michel2323 opened this issue 2 years ago • 3 comments

The following code crashes with

call:   %23 = call i32 @PMPI_Reduce(i64 noundef -1, i64 %19, i32 %12, i32 %20, i32 %21, i32 noundef 0, i32 %22) #10 [ "jl_roots"({} addrspace(10)* addrspacecast ({}* inttoptr (i64 140562307388096 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140562307248336 to {}*) to {} addrspace(10)*), {} addrspace(10)* %0) ], !dbg !65
 unhandled mpi_allreduce op:   %21 = load i32, i32 addrspace(11)* addrspacecast (i32* inttoptr (i64 140562307248336 to i32*) to i32 addrspace(11)*), align 16, !dbg !68, !tbaa !21

The complete log is attached.

using MPI
using Enzyme

function foo(x::Vector{Float64})
    MPI.Reduce!(x, MPI.SUM, 0, MPI.COMM_WORLD)
    return nothing
end

MPI.Init()

x = ones(10)
foo(x)
x = ones(10)
dx = zeros(10)
autodiff(foo, Duplicated(x, dx))

MPI.Finalize()

out1.log

michel2323 avatar Jul 26 '22 23:07 michel2323

What variant of MPI are you using?

cc @vchuravy

wsmoses avatar Jul 27 '22 20:07 wsmoses

The artifact one below. I also tried a system MPICH and OpenMPI. It should not work with any MPI library. Let me know if it does.

(jlScratch) pkg> st
Status `/scratch/mschanen/git/jlScratch/Project.toml`
  [052768ef] CUDA v3.12.0
  [7da242da] Enzyme v0.10.4 `/scratch/mschanen/julia_depot/dev/Enzyme`
  [da04e1cc] MPI v0.19.2
  [91a5bcdd] Plots v1.31.4
  [de0858da] Printf
  [10745b16] Statistics
julia> Pkg.build("MPI"; verbose=true)
    Building MPI → `/scratch/mschanen/julia_depot/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/d56a80d8cf8b9dc3050116346b3d83432b1912c0/build.log`
[ Info: using system MPI                             ]  0/1
┌ Info: Using implementation
│   libmpi = "libmpi"
│   mpiexec_cmd = `mpiexec`
└   MPI_LIBRARY_VERSION_STRING = "MPICH Version:\t3.3a2\nMPICH Release date:\tSun Nov 13 09:12:11 MST 2016\nMPICH Device:\tch3:nemesis\nMPICH configure:\t--build=x86_64-linux-gnu --prefix=/usr --includedir=\${prefix}/include --mandir=\${prefix}/share/man --infodir=\${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=\${prefix}/lib/x86_64-linux-gnu --libexecdir=\${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --with-libfabric --enable-shared --prefix=/usr --enable-fortran=all --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/mpich --docdir=/usr/share/doc/mpich --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr CPPFLAGS= CFLAGS= CXXFLAGS= FFLAGS= FCFLAGS=\nMPICH CC:\tgcc  -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security  -O2\nMPICH CXX:\tg++  -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2\nMPICH F77:\tgfortran  -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2\nMPICH FC:\tgfortran  -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3~a2=. -fstack-protector-strong -O2\n"
┌ Info: MPI implementation detected
│   impl = MPICH::MPIImpl = 1
│   version = v"3.3.0-a2"
└   abi = "MPICH"

michel2323 avatar Jul 27 '22 20:07 michel2323

Oh sorry, I did not notice it tries to use the system one even if the environment variable JULIA_MPI_BINARY is not set. Anyhow, tried with JULIA_MPI_BINARY="" which falls back to MPI_jll:

julia> Pkg.build("MPI"; verbose=true)
    Building MPI → `/scratch/mschanen/julia_depot/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/d56a80d8cf8b9dc3050116346b3d83432b1912c0/build.log`
[ Info: using default MPI jll                        ]  0/1

It gives the same error.

michel2323 avatar Jul 27 '22 21:07 michel2323