Enzyme icon indicating copy to clipboard operation
Enzyme copied to clipboard

Check for both MPI and PMPI versions

Open vchuravy opened this issue 1 month ago • 7 comments

@wsmoses on 1.10 I am seeing:

; Function Attrs: nofree norecurse nosync nounwind willreturn
declare i32 @PMPI_Wait(i64, i64) local_unnamed_addr #13

; Function Attrs: nofree norecurse nosync nounwind willreturn
declare i32 @MPI_Comm_rank(i32, i64) local_unnamed_addr #16

; Function Attrs: nofree norecurse nosync nounwind willreturn
declare i32 @MPI_Comm_size(i32, i64) local_unnamed_addr #17

; Function Attrs: nofree norecurse nosync nounwind willreturn
declare i32 @MPI_Irecv(i64, i32, i32, i32, i32, i32, i64) local_unnamed_addr #18

; Function Attrs: nofree norecurse nosync nounwind willreturn
declare i32 @MPI_Isend(i64, i32, i32, i32, i32, i32, i64) local_unnamed_addr #1

So when PMPI_Wait goes looking for MPI_Isend it fails to look at the right one.

If this looks kosher to you, I can also go and fix all the other users for getRenamedPerCallingConv

vchuravy avatar Nov 06 '25 20:11 vchuravy

this is the wrong fix overall getRenamedPerCallingConv("PMPI_...", "MPI_x") should give PMPI_x, which ought resolve?

wsmoses avatar Nov 06 '25 20:11 wsmoses

this is the wrong fix overall getRenamedPerCallingConv("PMPI_...", "MPI_x") should give PMPI_x, which ought resolve?

No, that's the crux of the issue. A module may contain a mix of PMPI and MPI names, likely due to Enzyme.jl trying to backsolve the name and the symbol either being PMPI or MPI.

The full module is here. https://gist.github.com/vchuravy/6f5edc18764db407a019294eba7f39e5

And we are running on PMPI_Wait and we are looking for MPI_Isend, which we normalize to PMPI_Isend and then not find in the name.

julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/Utils.cpp:1804: llvm::Function* getOrInsertDifferentialMPI_Wait(llvm::Module&, llvm::ArrayRef<llvm::Type*>, llvm::Type*, llvm::StringRef): Assertion `isendfn' failed.

[1123] signal (6.-6): Aborted
in expression starting at /__w/Enzyme.jl/Enzyme.jl/test/integration/MPI/nonblocking_halo.jl:50
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7ee131fcb81a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
getOrInsertDifferentialMPI_Wait at /workspace/srcdir/Enzyme/enzyme/Enzyme/Utils.cpp:1804
handleMPI at /workspace/srcdir/Enzyme/enzyme/Enzyme/CallDerivatives.cpp:429
handleKnownCallDerivatives at /workspace/srcdir/Enzyme/enzyme/Enzyme/CallDerivatives.cpp:2254
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:6405
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:111 [inlined]
CreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4505
EnzymeCreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:688
EnzymeCreatePrimalAndGradient at /__w/Enzyme.jl/Enzyme.jl/src/api.jl:270
jfptr_EnzymeCreatePrimalAndGradient_24158 at /root/.julia/compiled/v1.10/Enzyme/G1p5n_64aGk.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-7/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-7/julialang/julia-release-1-dot-10/src/gf.c:3077
macro expansion at /__w/Enzyme.jl/Enzyme.jl/src/compiler.jl:2639 [inlined]
macro expansion at /root/.julia/packages/LLVM/iza6e/src/base.jl:97 [inlined]
enzyme! at /__w/Enzyme.jl/Enzyme.jl/src/compiler.jl:2512

https://github.com/EnzymeAD/Enzyme.jl/pull/518#discussion_r2500642642

vchuravy avatar Nov 06 '25 21:11 vchuravy

I mean the bigger issue is that we should never have a mix of mpi/pmpi?

since I assume julia never gives us a mix. So we should never generate a mix?

wsmoses avatar Nov 06 '25 22:11 wsmoses

the issue imo is that this https://github.com/EnzymeAD/Enzyme/blob/6b1848d8582e57dd57c0bb5d0c373c5cb1c1bbfb/enzyme/Enzyme/Utils.cpp#L1803 needs to become a getOrInsertFunction, not getFunction

wsmoses avatar Nov 06 '25 22:11 wsmoses

we have the args, so therefore we ought be able to form the functiontype

wsmoses avatar Nov 06 '25 22:11 wsmoses

since I assume julia never gives us a mix. So we should never generate a mix?

Enzyme.jl is to blame, it tries to invert a pointer from Julia and what name it ends up is a 50/50.

since I assume julia never gives us a mix. So we should never generate a mix?

See the module, Julia precisely gives us a mix and that is the issue I am trying to fix.

vchuravy avatar Nov 07 '25 19:11 vchuravy

ah bleh

wsmoses avatar Nov 07 '25 19:11 wsmoses