Cannot deduce type of copy `call void @llvm.memcpy.p10i8.p0i8.i64`
Reproducer:
git clone https://github.com/vchuravy/WaterLily.jl
cd WaterLily.jl/examples
git checkout vc/enzyme
# instantiate local project
julia +1.10 --project=. TandemFoilOptim.jl
ERROR: LoadError: Enzyme execution failed.
Enzyme cannot deduce type
Current scope:
; Function Attrs: mustprogress willreturn
define internal fastcc void @preprocess_julia__make_foils_1_2261([6 x {} addrspace(10)*]* noalias nocapture nofree noundef nonnull writeonly sret([6 x {} addrspace(10)*]) align
8 dereferenceable(48) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,16]:Pointer, [-1,32]:Pointer}" %0, float "enzyme_type"="{[-1]:Float@float}" "enzymejl_p
armtype"="138083780338720" "enzymejl_parmtype_ref"="0" %1) unnamed_addr #42 !dbg !725 {
; ...
Cannot deduce type of copy call void @llvm.memcpy.p10i8.p0i8.i64(i8 addrspace(10)* noundef align 1 dereferenceable(7) %newstruct31.sroa.3.sroa.2.0.newstruct31.sroa.3.0..sroa_
raw_idx.sroa_raw_idx, i8* noundef nonnull align 1 dereferenceable(7) %newstruct31.sroa.3.sroa.2.1.newstruct24.sroa.3.0.sroa_idx.sroa_idx, i64 noundef 7, i1 noundef false) #44,
!dbg !85
Caused by:
Stacktrace:
[1] Simulation
@ ~/src/WaterLily/src/WaterLily.jl:65
[2] #make_foils#1
@ ~/src/WaterLily/examples/TandemFoilOptim.jl:24
Full log: https://gist.github.com/vchuravy/8e70c7ff38fd150f941fef6a7af6cc92
The problem is that this type doesn't have any ino when taking a typetree of it between bytes 16 and 24.
%box34 = call noalias nonnull dereferenceable(240) "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Float@float, [-1,40]:Float@double, [-1,48]:Integer, [-1,49]:Integer, [-1,50]:Integer, [-1,51]:Integer, [-1,52]:Integer, [-1,53]:Integer, [-1,54]:Integer, [-1,55]:Integer, [-1,56]:Float@double, [-1,64]:Integer, [-1,65]:Integer, [-1,66]:Integer, [-1,67]:Integer, [-1,68]:Integer, [-1,69]:Integer, [-1,70]:Integer, [-1,71]:Integer, [-1,72]:Integer, [-1,73]:Integer, [-1,74]:Integer, [-1,75]:Integer, [-1,76]:Integer, [-1,77]:Integer, [-1,78]:Integer, [-1,79]:Integer, [-1,80]:Integer, [-1,81]:Integer, [-1,82]:Integer, [-1,83]:Integer, [-1,84]:Integer, [-1,85]:Integer, [-1,86]:Integer, [-1,87]:Integer, [-1,88]:Integer, [-1,89]:Integer, [-1,90]:Integer, [-1,91]:Integer, [-1,92]:Integer, [-1,93]:Integer, [-1,94]:Integer, [-1,95]:Integer, [-1,96]:Integer, [-1,97]:Integer, [-1,98]:Integer, [-1,99]:Integer, [-1,100]:Integer, [-1,101]:Integer, [-1,102]:Integer, [-1,103]:Integer, [-1,104]:Integer, [-1,105]:Integer, [-1,106]:Integer, [-1,107]:Integer, [-1,108]:Integer, [-1,109]:Integer, [-1,110]:Integer, [-1,111]:Integer, [-1,112]:Integer, [-1,113]:Integer, [-1,114]:Integer, [-1,115]:Integer, [-1,116]:Integer, [-1,117]:Integer, [-1,118]:Integer, [-1,119]:Integer, [-1,120]:Integer, [-1,121]:Integer, [-1,122]:Integer, [-1,123]:Integer, [-1,124]:Integer, [-1,125]:Integer, [-1,126]:Integer, [-1,127]:Integer, [-1,128]:Integer, [-1,136]:Integer, [-1,137]:Integer, [-1,138]:Integer, [-1,139]:Integer, [-1,140]:Integer, [-1,141]:Integer, [-1,142]:Integer, [-1,143]:Integer, [-1,144]:Float@float, [-1,152]:Float@double, [-1,160]:Integer, [-1,161]:Integer, [-1,162]:Integer, [-1,163]:Integer, [-1,164]:Integer, [-1,165]:Integer, [-1,166]:Integer, [-1,167]:Integer, [-1,168]:Float@double, [-1,176]:Integer, [-1,177]:Integer, [-1,178]:Integer, [-1,179]:Integer, [-1,180]:Integer, [-1,181]:Integer, [-1,182]:Integer, [-1,183]:Integer, [-1,184]:Integer, [-1,185]:Integer, [-1,186]:Integer, [-1,187]:Integer, [-1,188]:Integer, [-1,189]:Integer, [-1,190]:Integer, [-1,191]:Integer, [-1,192]:Integer, [-1,193]:Integer, [-1,194]:Integer, [-1,195]:Integer, [-1,196]:Integer, [-1,197]:Integer, [-1,198]:Integer, [-1,199]:Integer, [-1,200]:Integer, [-1,201]:Integer, [-1,202]:Integer, [-1,203]:Integer, [-1,204]:Integer, [-1,205]:Integer, [-1,206]:Integer, [-1,207]:Integer, [-1,208]:Integer, [-1,209]:Integer, [-1,210]:Integer, [-1,211]:Integer, [-1,212]:Integer, [-1,213]:Integer, [-1,214]:Integer, [-1,215]:Integer, [-1,216]:Integer, [-1,217]:Integer, [-1,218]:Integer, [-1,219]:Integer, [-1,220]:Integer, [-1,221]:Integer, [-1,222]:Integer, [-1,223]:Integer, [-1,224]:Integer, [-1,225]:Integer, [-1,226]:Integer, [-1,227]:Integer, [-1,228]:Integer, [-1,229]:Integer, [-1,230]:Integer, [-1,231]:Integer, [-1,232]:Integer, [-1,233]:Integer, [-1,234]:Integer, [-1,235]:Integer, [-1,236]:Integer, [-1,237]:Integer, [-1,238]:Integer, [-1,239]:Integer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 240, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 137776915697616 to {}*) to {} addrspace(10)*)) #46, !dbg !757
%35 = bitcast {} addrspace(10)* %box34 to i8 addrspace(10)*, !dbg !757
julia> obj(137776915697616)
AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}
okay I'm deeply confused by this memcpy of 7 bytes. Why is this happening. where does it come from?
logs of relevance so we don't need to rerun:
julia> obj(x) = Base.unsafe_pointer_to_objref(Base.reinterpret(Ptr{Cvoid}, x))
obj (generic function with 1 method)
julia> obj(137776915697616)
AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}
julia> T =obj(137776915697616)
AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}
julia> fieldtypes(T)
(WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}})
julia> fieldoffsets(T)
ERROR: UndefVarError: `fieldoffsets` not defined
Stacktrace:
[1] top-level scope
@ REPL[10]:1
julia> fieldoffset(T, 1)
0x0000000000000000
julia> fieldoffset(T, 2)
0x0000000000000080
julia> T = fieldtypes(T)[1]
WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}
julia> fieldtypes(T)
(Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}})
julia> fieldtypes(T, 1)
ERROR: MethodError: no method matching fieldtypes(::Type{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, ::Int64)
Closest candidates are:
fieldtypes(::Type)
@ Base reflection.jl:919
Stacktrace:
[1] top-level scope
@ REPL[15]:1
julia> fieldoffset(T, 1)
0x0000000000000000
julia> fieldoffset(T, 2)
0x0000000000000008
julia> fieldoffset(T, 3)
0x0000000000000010
julia> fieldoffset(T, 4)
ERROR: BoundsError: attempt to access DataType at index [4]
Stacktrace:
[1] fieldoffset(x::DataType, idx::Int64)
@ Base ./reflection.jl:779
[2] top-level scope
@ REPL[19]:1
julia> S = fieldtype(T, 3)
var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}
julia> size(S)
ERROR: MethodError: no method matching size(::Type{var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}})
Closest candidates are:
size(::LLVM.FunctionBlockSet)
@ LLVM ~/.julia/packages/LLVM/6cDbl/src/core/function.jl:129
size(::BitVector)
@ Base bitarray.jl:104
size(::BitVector, ::Integer)
@ Base bitarray.jl:107
...
Stacktrace:
[1] top-level scope
@ REPL[21]:1
julia> sizeof(S)
112
julia> using LLVM
│ Package LLVM not found, but a package named LLVM is available from a registry.
│ Install package?
│ (examples) pkg> add LLVM
└ (y/n/o) [y]: y
Resolving package versions...
Updating `~/git/Enzyme.jl/WaterLily.jl/examples/Project.toml`
[929cbde3] + LLVM v7.2.1
No Changes to `~/git/Enzyme.jl/WaterLily.jl/examples/Manifest.toml`
Precompiling project...
✗ GLMakie
74 dependencies successfully precompiled in 59 seconds. 296 already precompiled.
3 dependencies had output during precompilation:
┌ WaterLily → WaterLilyWriteVTKExt
│ ┌ Warning:
│ │ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│ │ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
│ └ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
└
┌ WaterLily → WaterLilyCUDAExt
│ ┌ Warning:
│ │ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│ │ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
│ └ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
└
┌ WaterLily → WaterLilyReadVTKExt
│ ┌ Warning:
│ │ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│ │ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
│ └ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
└
1 dependency errored.
For a report of the errors see `julia> err`. To retry use `pkg> precompile`
julia> ctx = LLVM.Context()
LLVM.Context(0x0000000005bc7470, typed ptrs)
julia> tt(T) = string(Enzyme.typetree(T, ctx, ""))
tt (generic function with 1 method)
julia> tt(S)
"{[0]:Integer, [8]:Integer, [9]:Integer, [10]:Integer, [11]:Integer, [12]:Integer, [13]:Integer, [14]:Integer, [15]:Integer, [16]:Float@float, [24]:Float@double, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer, [40]:Float@double, [48]:Integer, [49]:Integer, [50]:Integer, [51]:Integer, [52]:Integer, [53]:Integer, [54]:Integer, [55]:Integer, [56]:Integer, [57]:Integer, [58]:Integer, [59]:Integer, [60]:Integer, [61]:Integer, [62]:Integer, [63]:Integer, [64]:Integer, [65]:Integer, [66]:Integer, [67]:Integer, [68]:Integer, [69]:Integer, [70]:Integer, [71]:Integer, [72]:Integer, [73]:Integer, [74]:Integer, [75]:Integer, [76]:Integer, [77]:Integer, [78]:Integer, [79]:Integer, [80]:Integer, [81]:Integer, [82]:Integer, [83]:Integer, [84]:Integer, [85]:Integer, [86]:Integer, [87]:Integer, [88]:Integer, [89]:Integer, [90]:Integer, [91]:Integer, [92]:Integer, [93]:Integer, [94]:Integer, [95]:Integer, [96]:Integer, [97]:Integer, [98]:Integer, [99]:Integer, [100]:Integer, [101]:Integer, [102]:Integer, [103]:Integer, [104]:Integer, [105]:Integer, [106]:Integer, [107]:Integer, [108]:Integer, [109]:Integer, [110]:Integer, [111]:Integer}"
julia> fieldtypes(S)
(Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}})
julia> fieldoffset(S, 1)
0x0000000000000000
julia> fieldoffset(S, 2)
0x0000000000000008
julia> fieldoffset(S, 3)
0x0000000000000010
julia> fieldoffset(S, 4)
0x0000000000000018
julia> Int(fieldoffset(S, 4))
24
julia> Int(fieldoffset(S, 3))
16
julia> fieldtypes(S)[3]
Float32
Why is this happening. where does it come from?
This is likely LLVM optimizing a copy loop? But why 7 and not 9 I do not know.
okay I've fixed the actual issues from this issue at hand.
However now it.....segfaults
This is now resolved on main, both original error and segfault. The total code doesn't run however due to Enzyme's cache algorithm getting confused:
(base) wmoses-macbookpro2:examples wmoses$ julia --project=. TandemFoilOptim.jl
┌ Warning:
│ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
└ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
ERROR: LoadError: Enzyme compilation failed.
Current scope:
; Function Attrs: mustprogress nofree willreturn
define internal fastcc void @preprocess_julia___kern_421_131_9980({ [2 x i64], {} addrspace(10)* } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(24) "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Pointer, [-1,16,-1]:Pointer}" "enzymejl_parmtype"="5475090832" "enzymejl_parmtype_ref"="1" %0, {} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %1, i64 signext "enzyme_inactive" "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="4855344304" "enzymejl_parmtype_ref"="0" %2, {} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %3) unnamed_addr #22 !dbg !265 {
top:
%4 = call {}*** @julia.get_pgcstack() #26
%ptls_field72 = getelementptr inbounds {}**, {}*** %4, i64 2
%5 = bitcast {}*** %ptls_field72 to i64***
%ptls_load7374 = load i64**, i64*** %5, align 8, !tbaa !12
%6 = getelementptr inbounds i64*, i64** %ptls_load7374, i64 2
%safepoint = load i64*, i64** %6, align 8, !tbaa !16
fence syncscope("singlethread") seq_cst
call void @julia.safepoint(i64* %safepoint) #26, !dbg !266
fence syncscope("singlethread") seq_cst
%7 = getelementptr inbounds { [2 x i64], {} addrspace(10)* }, { [2 x i64], {} addrspace(10)* } addrspace(11)* %0, i64 0, i32 0, i64 1, !dbg !267
%unbox2 = load i64, i64 addrspace(11)* %7, align 8, !dbg !271, !tbaa !16, !alias.scope !64, !noalias !65, !enzyme_inactive !0
%8 = add i64 %unbox2, -1, !dbg !271
%9 = call i64 @llvm.smax.i64(i64 %8, i64 noundef 1) #26, !dbg !273
%10 = icmp ult i64 %9, 2, !dbg !276
br i1 %10, label %L208, label %L36.preheader, !dbg !280
L36.preheader: ; preds = %top
%11 = getelementptr inbounds { [2 x i64], {} addrspace(10)* }, { [2 x i64], {} addrspace(10)* } addrspace(11)* %0, i64 0, i32 0, i64 0, !dbg !267
%unbox = load i64, i64 addrspace(11)* %11, align 8, !dbg !271, !tbaa !16, !alias.scope !64, !noalias !65
%12 = add i64 %unbox, -1, !dbg !271
%13 = call i64 @llvm.smax.i64(i64 %12, i64 noundef 1) #26, !dbg !281
%14 = icmp ult i64 %13, 2
%.not76 = icmp eq i64 %2, 1
%.not77 = icmp eq i64 %2, 2
%15 = select i1 %.not77, i64 -2, i64 -1
%.phi.trans.insert64 = addrspacecast {} addrspace(10)* %3 to {} addrspace(10)* addrspace(11)*
%arraysize_ptr.phi.trans.insert = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %.phi.trans.insert64, i64 3
%.phi.trans.insert65 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr.phi.trans.insert to i64 addrspace(11)*
%arraysize_ptr31.phi.trans.insert = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %.phi.trans.insert64, i64 4
%.phi.trans.insert68 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr31.phi.trans.insert to i64 addrspace(11)*
%16 = addrspacecast {} addrspace(10)* %3 to float addrspace(13)* addrspace(11)*
%17 = add i64 %2, -1
%18 = addrspacecast {} addrspace(10)* %1 to {} addrspace(10)* addrspace(11)*
%arraysize_ptr47 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %18, i64 3
%19 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr47 to i64 addrspace(11)*
%arraysize_ptr50 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %18, i64 4
%20 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr50 to i64 addrspace(11)*
%21 = addrspacecast {} addrspace(10)* %1 to float addrspace(13)* addrspace(11)*
%22 = select i1 %.not76, i64 2, i64 3
%23 = add nsw i64 %13, -2
br label %L36, !dbg !282
L36: ; preds = %L187, %L36.preheader
%iv = phi i64 [ %iv.next, %L187 ], [ 0, %L36.preheader ]
%24 = shl nuw i64 %iv, 1, !dbg !282
%25 = add i64 %24, 2, !dbg !282
%26 = add nuw i64 %iv, 2, !dbg !282
%iv.next = add nuw nsw i64 %iv, 1, !dbg !282
br i1 %14, label %L187, label %L47.lr.ph, !dbg !282
L47.lr.ph: ; preds = %L36
%27 = shl nuw i64 %26, 1
%28 = add i64 %27, -2
%29 = add i64 %27, %15
%.not79 = icmp sgt i64 %28, %29
%30 = add i64 %27, -3
%value_phi16 = select i1 %.not79, i64 %30, i64 %29
%31 = icmp sgt i64 %28, %value_phi16
%arraysize.pre = load i64, i64 addrspace(11)* %.phi.trans.insert65, align 8, !enzyme_inactive !0
%arraysize32.pre = load i64, i64 addrspace(11)* %.phi.trans.insert68, align 16, !enzyme_inactive !0
%arrayptr.pre80 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %16, align 16
%32 = mul i64 %arraysize32.pre, %17
%33 = add i64 %32, -1
%arraysize48 = load i64, i64 addrspace(11)* %19, align 8, !enzyme_inactive !0
%34 = add nsw i64 %26, -1
%arraysize51 = load i64, i64 addrspace(11)* %20, align 16, !enzyme_inactive !0
%arrayptr5482 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %21, align 16
%35 = mul i64 %arraysize51, %17
%reass.add85 = add i64 %34, %35
%reass.mul86 = mul i64 %reass.add85, %arraysize48
br label %L66, !dbg !283
L66: ; preds = %L178, %L47.lr.ph
%iv1 = phi i64 [ %iv.next2, %L178 ], [ 0, %L47.lr.ph ]
%36 = shl nuw i64 %iv1, 1, !dbg !284
%37 = add i64 %36, 2, !dbg !284
%iv.next2 = add nuw nsw i64 %iv1, 1, !dbg !284
%38 = shl nuw i64 %iv1, 1, !dbg !287
%39 = add nuw i64 %38, 2, !dbg !295
%40 = add nuw i64 %22, %38, !dbg !295
%.not78 = icmp sgt i64 %39, %40, !dbg !298
%41 = or i64 %38, 1, !dbg !300
%value_phi15 = select i1 %.not78, i64 %41, i64 %40, !dbg !300
%42 = icmp sgt i64 %39, %value_phi15, !dbg !306
%not. = or i1 %31, %42, !dbg !309
br i1 %not., label %L178, label %L130.outer.preheader, !dbg !292
L130.outer.preheader: ; preds = %L66
br label %L130.outer, !dbg !310
L130.outer: ; preds = %L130.outer.preheader, %L148
%iv3 = phi i64 [ 0, %L130.outer.preheader ], [ %iv.next4, %L148 ]
%value_phi30.ph = phi float [ %48, %L148 ], [ 0.000000e+00, %L130.outer.preheader ]
%43 = add i64 %25, %iv3
%iv.next4 = add nuw nsw i64 %iv3, 1
%reass.add = add i64 %33, %43
%reass.mul = mul i64 %reass.add, %arraysize.pre
%44 = add i64 %reass.mul, -1
br label %L130, !dbg !310
L130: ; preds = %L130, %L130.outer
%iv5 = phi i64 [ %iv.next6, %L130 ], [ 0, %L130.outer ]
%value_phi30 = phi float [ %48, %L130 ], [ %value_phi30.ph, %L130.outer ]
%45 = add i64 %37, %iv5, !dbg !313
%iv.next6 = add nuw nsw i64 %iv5, 1, !dbg !313
%46 = add i64 %44, %45, !dbg !313
%47 = getelementptr inbounds float, float addrspace(13)* %arrayptr.pre80, i64 %46, !dbg !313
%arrayref = load float, float addrspace(13)* %47, align 4, !dbg !313, !tbaa !134, !alias.scope !31, !noalias !34
%48 = fadd fast float %arrayref, %value_phi30, !dbg !316
%49 = add i64 %45, 1, !dbg !317
%50 = icmp sgt i64 %39, %49, !dbg !319
%51 = icmp sgt i64 %49, %value_phi15, !dbg !319
%52 = or i1 %50, %51, !dbg !310
%53 = icmp eq i64 %45, %value_phi15
%or.cond = or i1 %53, %52, !dbg !310
br i1 %or.cond, label %L148, label %L130, !dbg !310
L148: ; preds = %L130
%54 = add i64 %43, 1, !dbg !322
%55 = icmp sle i64 %28, %54, !dbg !325
%56 = icmp sle i64 %54, %value_phi16, !dbg !325
%57 = and i1 %55, %56, !dbg !329
%58 = icmp ne i64 %43, %value_phi16, !dbg !328
%extract.t = and i1 %58, %57, !dbg !330
br i1 %extract.t, label %L130.outer, label %L178.loopexit, !dbg !312
L178.loopexit: ; preds = %L148
br label %L178, !dbg !331
L178: ; preds = %L178.loopexit, %L66
%value_phi46 = phi float [ 0.000000e+00, %L66 ], [ %48, %L178.loopexit ]
%59 = fmul fast float %value_phi46, 5.000000e-01, !dbg !331
%60 = add i64 %iv.next2, %reass.mul86, !dbg !333
%61 = getelementptr inbounds float, float addrspace(13)* %arrayptr5482, i64 %60, !dbg !333
store float %59, float addrspace(13)* %61, align 4, !dbg !333, !tbaa !134, !alias.scope !31, !noalias !335
%exitcond.not = icmp eq i64 %iv1, %23, !dbg !338
br i1 %exitcond.not, label %L187.loopexit, label %L66, !dbg !283, !llvm.loop !339
L187.loopexit: ; preds = %L178
br label %L187, !dbg !340
L187: ; preds = %L187.loopexit, %L36
%62 = add nuw i64 %26, 1, !dbg !340
%63 = icmp slt i64 %62, 2, !dbg !344
%64 = icmp sgt i64 %62, %9, !dbg !344
%65 = icmp eq i64 %26, %9, !dbg !347
%not.not.84 = or i1 %63, %64, !dbg !347
%narrow83 = or i1 %65, %not.not.84, !dbg !347
br i1 %narrow83, label %L208.loopexit, label %L36, !dbg !343
L208.loopexit: ; preds = %L187
br label %L208, !dbg !270
L208: ; preds = %L208.loopexit, %top
ret void, !dbg !270
}
Illegal replace ficticious phi for: %unbox_replacementA = phi i64 , !dbg !21 of %unbox = load i64, i64 addrspace(11)* %11, align 8, !dbg !28, !tbaa !16, !alias.scope !34, !noalias !37
; Function Attrs: mustprogress nofree willreturn
define internal fastcc void @diffejulia___kern_421_131_9980({ [2 x i64], {} addrspace(10)* } addrspace(11)* nocapture nofree readonly align 8 dereferenceable(24) "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Pointer, [-1,16,-1]:Pointer}" "enzymejl_parmtype"="5475090832" "enzymejl_parmtype_ref"="1" %0, {} addrspace(10)* nocapture nofree readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %1, {} addrspace(10)* nocapture nofree align 16 "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %"'", i64 signext "enzyme_inactive" "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="4855344304" "enzymejl_parmtype_ref"="0" %2, {} addrspace(10)* nocapture nofree readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %3, {} addrspace(10)* nocapture nofree align 16 "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %"'1", { i64, i64, i64*, i64** } %tapeArg) unnamed_addr #22 !dbg !631 {
top:
%4 = call {}*** @julia.get_pgcstack() #26
%ptls_field72_replacementA = phi {}***
%_replacementA14 = phi i64***
%ptls_load7374_replacementA = phi i64**
%_replacementA13 = phi i64**
%safepoint_replacementA = phi i64*
%_replacementA12 = phi i64 addrspace(11)* , !dbg !632
%unbox2_replacementA = phi i64 , !dbg !636
%_replacementA = phi i64 , !dbg !636
%5 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0, !dbg !638
%6 = icmp ult i64 %5, 2, !dbg !641
br i1 %6, label %L208, label %L36.preheader, !dbg !645
L36.preheader: ; preds = %top
%_replacementA21 = phi i64 addrspace(11)* , !dbg !632
%unbox_replacementA = phi i64 , !dbg !636
%_replacementA20 = phi i64 , !dbg !636
%7 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1, !dbg !646
%8 = icmp ult i64 %7, 2
%.not76 = icmp eq i64 %2, 1
%.not77 = icmp eq i64 %2, 2
%9 = select i1 %.not77, i64 -2, i64 -1
%.phi.trans.insert64_replacementA = phi {} addrspace(10)* addrspace(11)*
%arraysize_ptr.phi.trans.insert_replacementA = phi {} addrspace(10)* addrspace(11)*
%arraysize_ptr31.phi.trans.insert_replacementA = phi {} addrspace(10)* addrspace(11)*
%.phi.trans.insert68_replacementA = phi i64 addrspace(11)*
%"'ipc26" = addrspacecast {} addrspace(10)* %"'1" to float addrspace(13)* addrspace(11)*
%10 = addrspacecast {} addrspace(10)* %3 to float addrspace(13)* addrspace(11)*
%_replacementA19 = phi i64
%_replacementA18 = phi {} addrspace(10)* addrspace(11)*
%arraysize_ptr47_replacementA = phi {} addrspace(10)* addrspace(11)*
%_replacementA17 = phi i64 addrspace(11)*
%arraysize_ptr50_replacementA = phi {} addrspace(10)* addrspace(11)*
%_replacementA16 = phi i64 addrspace(11)*
%"'ipc" = addrspacecast {} addrspace(10)* %"'" to float addrspace(13)* addrspace(11)*
%_replacementA15 = phi float addrspace(13)* addrspace(11)*
%11 = select i1 %.not76, i64 2, i64 3
%12 = add i64 %7, -2
%13 = add nsw i64 %5, -2, !dbg !647
%14 = add nuw nsw i64 %5, 1, !dbg !647
%smax = call i64 @llvm.smax.i64(i64 %14, i64 3), !dbg !647
%15 = add nsw i64 %smax, -3, !dbg !647
%umin = call i64 @llvm.umin.i64(i64 %13, i64 %15), !dbg !647
%16 = add nuw i64 %umin, 1, !dbg !647
%17 = add nuw i64 %12, 1, !dbg !647
%18 = mul nuw nsw i64 %17, %16, !dbg !647
%19 = mul nuw i64 %18, 8, !dbg !647
%20 = call noalias nonnull i8* @malloc(i64 %19), !dbg !647, !enzyme_cache_alloc !648
%loopLimit_malloccache = bitcast i8* %20 to i64*, !dbg !647
store i64* %loopLimit_malloccache, i64** %loopLimit_cache, align 8, !dbg !647, !invariant.group !650
store i64 %7, i64* %_cache81, align 8, !dbg !647, !invariant.group !651
store i64 %unbox_replacementA, i64* %unbox_cache, align 8, !dbg !647, !tbaa !16, !invariant.group !652
%21 = mul nuw i64 %18, 8, !dbg !647
%22 = call noalias nonnull i8* @malloc(i64 %21), !dbg !647, !enzyme_cache_alloc !653
%loopLimit_malloccache3 = bitcast i8* %22 to i64**, !dbg !647
store i64** %loopLimit_malloccache3, i64*** %loopLimit_cache2, align 8, !dbg !647, !invariant.group !655
%23 = mul nuw i64 %18, 8, !dbg !647
%24 = mul nuw i64 %16, 8, !dbg !647
br label %L36, !dbg !647
L36: ; preds = %L187, %L36.preheader
%iv = phi i64 [ %iv.next, %L187 ], [ 0, %L36.preheader ]
%iv.next = add nuw nsw i64 %iv, 1, !dbg !647
%25 = shl nuw i64 %iv, 1, !dbg !647
%26 = add i64 %25, 2, !dbg !647
%27 = add nuw i64 %iv, 2, !dbg !647
br i1 %8, label %L187, label %L47.lr.ph, !dbg !647
L47.lr.ph: ; preds = %L36
%28 = shl nuw i64 %27, 1
%29 = add i64 %28, -2
%30 = add i64 %28, %9
%.not79 = icmp sgt i64 %29, %30
%31 = add i64 %28, -3
%value_phi16 = select i1 %.not79, i64 %31, i64 %30
%32 = icmp sgt i64 %29, %value_phi16
%arraysize.pre_replacementA = phi i64
%arraysize32.pre_replacementA = phi i64
%"arrayptr.pre80'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc26", align 16, !alias.scope !656, !noalias !659, !invariant.group !661
%arrayptr.pre80 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %10, align 16, !alias.scope !659, !noalias !656, !invariant.group !662
%_replacementA25 = phi i64
%_replacementA24 = phi i64
%arraysize48_replacementA = phi i64
%_replacementA23 = phi i64
%arraysize51_replacementA = phi i64
%"arrayptr5482'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc", align 16, !alias.scope !663, !noalias !666, !invariant.group !668
%arrayptr5482_replacementA = phi float addrspace(13)*
%_replacementA22 = phi i64
%reass.add85_replacementA = phi i64
%33 = load i64*, i64** %mdyncache_fromtape_cache93, align 8, !dbg !669, !dereferenceable !236, !invariant.group !670
%34 = getelementptr inbounds i64, i64* %33, i64 %iv, !dbg !669
%reass.mul86 = load i64, i64* %34, align 8, !dbg !669, !invariant.group !671
br label %L66, !dbg !669
L66: ; preds = %L178, %L47.lr.ph
%iv1 = phi i64 [ %iv.next2, %L178 ], [ 0, %L47.lr.ph ]
%iv.next2 = add nuw nsw i64 %iv1, 1, !dbg !672
%35 = shl nuw i64 %iv1, 1, !dbg !672
%36 = add i64 %35, 2, !dbg !672
%37 = shl nuw i64 %iv1, 1, !dbg !675
%38 = add nuw i64 %37, 2, !dbg !683
%39 = add nuw i64 %11, %37, !dbg !683
%.not78 = icmp sgt i64 %38, %39, !dbg !686
%40 = or i64 %37, 1, !dbg !688
%value_phi15 = select i1 %.not78, i64 %40, i64 %39, !dbg !688
%41 = icmp sgt i64 %38, %value_phi15, !dbg !694
%not. = or i1 %32, %41, !dbg !697
br i1 %not., label %L178, label %L130.outer.preheader, !dbg !680
L130.outer.preheader: ; preds = %L66
%42 = mul nuw nsw i64 %17, %16, !dbg !698
%43 = mul nuw nsw i64 %iv, %17, !dbg !698
%44 = add nuw nsw i64 %iv1, %43, !dbg !698
%45 = load i64**, i64*** %loopLimit_cache2, align 8, !dbg !698, !invariant.group !701
%46 = getelementptr inbounds i64*, i64** %45, i64 %44, !dbg !698
store i64* null, i64** %46, align 8, !dbg !698
%47 = mul nuw nsw i64 %17, %16, !dbg !698
%48 = mul nuw nsw i64 %iv, %17, !dbg !698
%49 = add nuw nsw i64 %iv1, %48, !dbg !698
%50 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 3, !dbg !698
%51 = getelementptr inbounds i64*, i64** %50, i64 %49, !dbg !698
%52 = mul nuw nsw i64 %17, %16, !dbg !698
%53 = mul nuw nsw i64 %iv, %17, !dbg !698
%54 = add nuw nsw i64 %iv1, %53, !dbg !698
%55 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dbg !698, !invariant.group !702
%56 = getelementptr inbounds i64*, i64** %55, i64 %54, !dbg !698
br label %L130.outer, !dbg !698
L130.outer: ; preds = %L148, %L130.outer.preheader
%iv3 = phi i64 [ 0, %L130.outer.preheader ], [ %iv.next4, %L148 ]
%value_phi30.ph_replacementA = phi float
%iv.next4 = add nuw nsw i64 %iv3, 1
%57 = load i64*, i64** %51, align 8
%58 = load i64*, i64** %46, align 8
%59 = bitcast i64* %58 to i8*
%loopLimit_realloccache = call i8* @__enzyme_exponentialallocationzero(i8* %59, i64 %iv.next4, i64 8)
%60 = bitcast i8* %loopLimit_realloccache to i64*
store i64* %60, i64** %46, align 8
%61 = add i64 %26, %iv3
%reass.mul_replacementA = phi i64
%62 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dbg !698, !dereferenceable !236, !invariant.group !703
%63 = mul nuw nsw i64 %17, %16, !dbg !698
%64 = mul nuw nsw i64 %iv, %17, !dbg !698
%65 = add nuw nsw i64 %iv1, %64, !dbg !698
%66 = getelementptr inbounds i64*, i64** %62, i64 %65, !dbg !698
%67 = load i64*, i64** %66, align 8, !dbg !698, !dereferenceable !236, !invariant.group !704
%68 = getelementptr inbounds i64, i64* %67, i64 %iv3, !dbg !698
%69 = load i64, i64* %68, align 8, !dbg !698, !invariant.group !705
%70 = mul nuw nsw i64 %17, %16, !dbg !698
%71 = mul nuw nsw i64 %iv, %17, !dbg !698
%72 = add nuw nsw i64 %iv1, %71, !dbg !698
br label %L130, !dbg !698
L130: ; preds = %L130, %L130.outer
%iv5 = phi i64 [ %iv.next6, %L130 ], [ 0, %L130.outer ]
%value_phi30_replacementA = phi float
%iv.next6 = add nuw nsw i64 %iv5, 1, !dbg !706
%73 = add i64 %36, %iv5, !dbg !706
%74 = add i64 %69, %73, !dbg !706
%"'ipg" = getelementptr inbounds float, float addrspace(13)* %"arrayptr.pre80'ipl", i64 %74, !dbg !706
%75 = getelementptr inbounds float, float addrspace(13)* %arrayptr.pre80, i64 %74, !dbg !706
%arrayref_replacementA = phi float , !dbg !706
%_replacementA27_replacementA = phi float , !dbg !709
%76 = add i64 %73, 1, !dbg !710
%77 = icmp sgt i64 %38, %76, !dbg !712
%78 = icmp sgt i64 %76, %value_phi15, !dbg !712
%79 = or i1 %77, %78, !dbg !698
%80 = icmp eq i64 %73, %value_phi15
%or.cond = or i1 %80, %79, !dbg !698
br i1 %or.cond, label %L148, label %L130, !dbg !698
L148: ; preds = %L130
%81 = phi i64 [ %iv5, %L130 ], !dbg !715
%82 = load i64**, i64*** %loopLimit_cache2, align 8, !dbg !715, !dereferenceable !236, !invariant.group !655
%83 = mul nuw nsw i64 %17, %16, !dbg !715
%84 = mul nuw nsw i64 %iv, %17, !dbg !715
%85 = add nuw nsw i64 %iv1, %84, !dbg !715
%86 = getelementptr inbounds i64*, i64** %82, i64 %85, !dbg !715
%87 = load i64*, i64** %86, align 8, !dbg !715, !dereferenceable !236, !invariant.group !718
%88 = getelementptr inbounds i64, i64* %87, i64 %iv3, !dbg !715
store i64 %81, i64* %88, align 8, !dbg !715, !invariant.group !719
%89 = add i64 %61, 1, !dbg !715
%90 = icmp sle i64 %29, %89, !dbg !720
%91 = icmp sle i64 %89, %value_phi16, !dbg !720
%92 = and i1 %90, %91, !dbg !724
%93 = icmp ne i64 %61, %value_phi16, !dbg !723
%extract.t = and i1 %93, %92, !dbg !725
br i1 %extract.t, label %L130.outer, label %L178.loopexit, !dbg !700
L178.loopexit: ; preds = %L148
%94 = phi i64 [ %iv3, %L148 ], !dbg !726
%95 = load i64*, i64** %loopLimit_cache, align 8, !dbg !726, !dereferenceable !236, !invariant.group !650
%96 = mul nuw nsw i64 %17, %16, !dbg !726
%97 = mul nuw nsw i64 %iv, %17, !dbg !726
%98 = add nuw nsw i64 %iv1, %97, !dbg !726
%99 = getelementptr inbounds i64, i64* %95, i64 %98, !dbg !726
store i64 %94, i64* %99, align 8, !dbg !726, !invariant.group !730
br label %L178, !dbg !726
L178: ; preds = %L178.loopexit, %L66
%value_phi46_replacementA = phi float
%_replacementA67_replacementA = phi float , !dbg !726
%100 = add i64 %iv.next2, %reass.mul86, !dbg !728
%"'ipg58" = getelementptr inbounds float, float addrspace(13)* %"arrayptr5482'ipl", i64 %100, !dbg !728
%_replacementA66 = phi float addrspace(13)* , !dbg !728
%exitcond.not = icmp eq i64 %iv1, %12, !dbg !731
br i1 %exitcond.not, label %L187.loopexit, label %L66, !dbg !669, !llvm.loop !732
L187.loopexit: ; preds = %L178
br label %L187, !dbg !733
L187: ; preds = %L187.loopexit, %L36
%101 = add nuw i64 %27, 1, !dbg !733
%102 = icmp slt i64 %101, 2, !dbg !737
%103 = icmp sgt i64 %101, %5, !dbg !737
%104 = icmp eq i64 %27, %5, !dbg !740
%not.not.84 = or i1 %102, %103, !dbg !740
%narrow83 = or i1 %104, %not.not.84, !dbg !740
br i1 %narrow83, label %L208.loopexit, label %L36, !dbg !736
L208.loopexit: ; preds = %L187
br label %L208, !dbg !635
L208: ; preds = %L208.loopexit, %top
br label %invertL208, !dbg !635
allocsForInversion: ; No predecessors!
%"iv'ac" = alloca i64, align 8
%"iv1'ac" = alloca i64, align 8
%"iv3'ac" = alloca i64, align 8
%loopLimit_cache = alloca i64*, align 8
%"iv5'ac" = alloca i64, align 8
%loopLimit_cache2 = alloca i64**, align 8
%unbox_cache = alloca i64, align 8
%"value_phi30.ph'de" = alloca float, align 4
%105 = getelementptr float, float* %"value_phi30.ph'de", i64 0
store float 0.000000e+00, float* %105, align 4
%"'de" = alloca float, align 4
%106 = getelementptr float, float* %"'de", i64 0
store float 0.000000e+00, float* %106, align 4
%"arrayref'de" = alloca float, align 4
%107 = getelementptr float, float* %"arrayref'de", i64 0
store float 0.000000e+00, float* %107, align 4
%"value_phi30'de" = alloca float, align 4
%108 = getelementptr float, float* %"value_phi30'de", i64 0
store float 0.000000e+00, float* %108, align 4
%_cache = alloca i64**, align 8
%reass.mul86_cache = alloca i64*, align 8
%"'de65" = alloca float, align 4
%109 = getelementptr float, float* %"'de65", i64 0
store float 0.000000e+00, float* %109, align 4
%"value_phi46'de" = alloca float, align 4
%110 = getelementptr float, float* %"value_phi46'de", i64 0
store float 0.000000e+00, float* %110, align 4
%_cache81 = alloca i64, align 8
%111 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 3
%mdyncache_fromtape_cache = alloca i64**, align 8
store i64** %111, i64*** %mdyncache_fromtape_cache, align 8
%112 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 2
%mdyncache_fromtape_cache93 = alloca i64*, align 8
store i64* %112, i64** %mdyncache_fromtape_cache93, align 8
inverttop: ; preds = %invertL208, %invertL36.preheader
fence syncscope("singlethread") seq_cst
fence syncscope("singlethread") seq_cst
ret void
invertL36.preheader: ; preds = %invertL36
%113 = load i64, i64* %"iv'ac", align 8
%114 = load i64, i64* %"iv1'ac", align 8
%forfree = load i64*, i64** %loopLimit_cache, align 8, !dereferenceable !236, !invariant.group !650
%115 = bitcast i64* %forfree to i8*
call void @free(i8* nonnull %115), !dbg !741, !enzyme_cache_free !648
%116 = load i64, i64* %"iv'ac", align 8
%117 = load i64, i64* %"iv1'ac", align 8
%forfree4 = load i64**, i64*** %loopLimit_cache2, align 8, !dereferenceable !236, !invariant.group !655
%118 = bitcast i64** %forfree4 to i8*
call void @free(i8* nonnull %118), !dbg !741, !enzyme_cache_free !653
%119 = load i64, i64* %"iv'ac", align 8
%120 = load i64, i64* %"iv1'ac", align 8
%121 = load i64, i64* %"iv'ac", align 8
%122 = load i64, i64* %"iv'ac", align 8
%123 = load i64, i64* %"iv1'ac", align 8
%forfree87 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dereferenceable !236, !invariant.group !703
%124 = bitcast i64** %forfree87 to i8*
call void @free(i8* nonnull %124), !dbg !741
%125 = load i64, i64* %"iv'ac", align 8
%forfree94 = load i64*, i64** %mdyncache_fromtape_cache93, align 8, !dereferenceable !236, !invariant.group !670
%126 = bitcast i64* %forfree94 to i8*
call void @free(i8* nonnull %126), !dbg !741
br label %inverttop
invertL36: ; preds = %invertL187, %invertL47.lr.ph
%127 = load i64, i64* %"iv'ac", align 8
%128 = icmp eq i64 %127, 0
%129 = xor i1 %128, true
br i1 %128, label %invertL36.preheader, label %incinvertL36
incinvertL36: ; preds = %invertL36
%130 = load i64, i64* %"iv'ac", align 8
%131 = add nsw i64 %130, -1
store i64 %131, i64* %"iv'ac", align 8
br label %invertL187
invertL47.lr.ph: ; preds = %invertL66
br label %invertL36
invertL66: ; preds = %invertL178, %invertL130.outer.preheader
%132 = load i64, i64* %"iv1'ac", align 8
%133 = icmp eq i64 %132, 0
%134 = xor i1 %133, true
br i1 %133, label %invertL47.lr.ph, label %incinvertL66
incinvertL66: ; preds = %invertL66
%135 = load i64, i64* %"iv1'ac", align 8
%136 = add nsw i64 %135, -1
store i64 %136, i64* %"iv1'ac", align 8
br label %invertL178
invertL130.outer.preheader: ; preds = %invertL130.outer
%137 = load i64, i64* %"iv'ac", align 8
%138 = load i64, i64* %"iv1'ac", align 8
%139 = load i64, i64* %"iv3'ac", align 8
%_unwrap = load i64**, i64*** %loopLimit_cache2, align 8, !dbg !698, !invariant.group !701
%140 = load i64, i64* %unbox_cache, align 8, !dbg !636, !tbaa !16, !alias.scope !64, !noalias !65, !invariant.group !652
%_unwrap5 = add i64 %140, -1
%_unwrap97 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1, !dbg !646
%_unwrap6 = add i64 %_unwrap97, -2
%_unwrap7 = add nuw i64 %_unwrap6, 1
%_unwrap8 = mul nuw nsw i64 %137, %_unwrap7
%_unwrap9 = add nuw nsw i64 %138, %_unwrap8
%_unwrap10 = getelementptr inbounds i64*, i64** %_unwrap, i64 %_unwrap9
%forfree11 = load i64*, i64** %_unwrap10, align 8, !dereferenceable !236, !invariant.group !718
%141 = bitcast i64* %forfree11 to i8*
call void @free(i8* nonnull %141), !dbg !741
%142 = load i64, i64* %"iv3'ac", align 8
%143 = load i64, i64* %"iv'ac", align 8
%144 = load i64, i64* %"iv1'ac", align 8
%145 = load i64, i64* %"iv3'ac", align 8
%_unwrap88 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dbg !698, !invariant.group !702
%_unwrap89 = mul nuw nsw i64 %143, %_unwrap7
%_unwrap90 = add nuw nsw i64 %144, %_unwrap89
%_unwrap91 = getelementptr inbounds i64*, i64** %_unwrap88, i64 %_unwrap90
%forfree92 = load i64*, i64** %_unwrap91, align 8, !dereferenceable !236, !invariant.group !704
%146 = bitcast i64* %forfree92 to i8*
call void @free(i8* nonnull %146), !dbg !741
br label %invertL66
invertL130.outer: ; preds = %invertL130_amerge
%147 = load float, float* %"value_phi30.ph'de", align 4
store float 0.000000e+00, float* %"value_phi30.ph'de", align 4
%148 = load i64, i64* %"iv3'ac", align 8
%149 = icmp eq i64 %148, 0
%150 = xor i1 %149, true
%151 = select fast i1 %150, float %147, float 0.000000e+00
%152 = load float, float* %"'de", align 4
%153 = fadd fast float %152, %147
%154 = select fast i1 %149, float %152, float %153
store float %154, float* %"'de", align 4
br i1 %149, label %invertL130.outer.preheader, label %incinvertL130.outer
incinvertL130.outer: ; preds = %invertL130.outer
%155 = load i64, i64* %"iv3'ac", align 8
%156 = add nsw i64 %155, -1
store i64 %156, i64* %"iv3'ac", align 8
br label %invertL148
invertL130: ; preds = %mergeinvertL130_L148, %incinvertL130
%157 = load float, float* %"'de", align 4, !dbg !709
store float 0.000000e+00, float* %"'de", align 4, !dbg !709
%158 = load float, float* %"arrayref'de", align 4, !dbg !709
%159 = fadd fast float %158, %157, !dbg !709
store float %159, float* %"arrayref'de", align 4, !dbg !709
%160 = load float, float* %"value_phi30'de", align 4, !dbg !709
%161 = fadd fast float %160, %157, !dbg !709
store float %161, float* %"value_phi30'de", align 4, !dbg !709
%162 = load float, float* %"arrayref'de", align 4, !dbg !706
store float 0.000000e+00, float* %"arrayref'de", align 4, !dbg !706
%163 = load i64, i64* %"iv5'ac", align 8, !dbg !706
%164 = load i64, i64* %"iv3'ac", align 8, !dbg !706
%165 = load i64, i64* %"iv1'ac", align 8, !dbg !706
%166 = load i64, i64* %"iv'ac", align 8, !dbg !706
%_unwrap28 = addrspacecast {} addrspace(10)* %3 to float addrspace(13)* addrspace(11)*, !dbg !706
%arrayptr.pre80_unwrap = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %_unwrap28, align 16, !alias.scope !659, !noalias !656, !invariant.group !662
%_unwrap101 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0, !dbg !706
%_unwrap35 = add nsw i64 %_unwrap101, -2, !dbg !706
%_unwrap36 = add nuw nsw i64 %_unwrap101, 1, !dbg !706
%167 = call i64 @llvm.smax.i64(i64 %_unwrap36, i64 3), !dbg !647
%_unwrap37 = add nsw i64 %167, -3, !dbg !706
%168 = call i64 @llvm.umin.i64(i64 %_unwrap35, i64 %_unwrap37), !dbg !647
%169 = add nuw i64 %168, 1, !dbg !706
%_unwrap96 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1, !dbg !706
%_unwrap39 = add i64 %_unwrap96, -2, !dbg !706
%170 = add nuw i64 %_unwrap39, 1, !dbg !706
%171 = mul nuw nsw i64 %170, %169, !dbg !706
%172 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 3, !dbg !706
%173 = mul nuw nsw i64 %170, %169, !dbg !706
%174 = mul nuw nsw i64 %166, %170, !dbg !706
%175 = add nuw nsw i64 %165, %174, !dbg !706
%176 = getelementptr inbounds i64*, i64** %172, i64 %175, !dbg !706
%177 = load i64*, i64** %176, align 8, !dbg !706, !dereferenceable !236, !invariant.group !742
%178 = getelementptr inbounds i64, i64* %177, i64 %164, !dbg !706
%179 = load i64, i64* %178, align 8, !dbg !706, !invariant.group !743
%_unwrap40 = shl nuw i64 %165, 1, !dbg !706
%_unwrap41 = add i64 %_unwrap40, 2, !dbg !706
%_unwrap42 = add i64 %_unwrap41, %163, !dbg !706
%_unwrap43 = add i64 %179, %_unwrap42, !dbg !706
%_unwrap44 = getelementptr inbounds float, float addrspace(13)* %arrayptr.pre80_unwrap, i64 %_unwrap43, !dbg !706
%180 = load i64, i64* %"iv5'ac", align 8, !dbg !706
%181 = load i64, i64* %"iv3'ac", align 8, !dbg !706
%182 = load i64, i64* %"iv1'ac", align 8, !dbg !706
%183 = load i64, i64* %"iv'ac", align 8, !dbg !706
%"'ipc26_unwrap" = addrspacecast {} addrspace(10)* %"'1" to float addrspace(13)* addrspace(11)*, !dbg !706
%"arrayptr.pre80'ipl_unwrap" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc26_unwrap", align 16, !alias.scope !656, !noalias !659, !invariant.group !661
%"'ipg_unwrap" = getelementptr inbounds float, float addrspace(13)* %"arrayptr.pre80'ipl_unwrap", i64 %_unwrap43, !dbg !706
%184 = icmp ne float addrspace(13)* %_unwrap44, %"'ipg_unwrap", !dbg !706
br i1 %184, label %invertL130_active, label %invertL130_amerge, !dbg !706
invertL130_active: ; preds = %invertL130
%185 = load float, float addrspace(13)* %"'ipg_unwrap", align 4, !dbg !706, !tbaa !134, !alias.scope !744, !noalias !747
%186 = fadd fast float %185, %162, !dbg !706
store float %186, float addrspace(13)* %"'ipg_unwrap", align 4, !dbg !706, !tbaa !134, !alias.scope !744, !noalias !747
br label %invertL130_amerge, !dbg !706
invertL130_amerge: ; preds = %invertL130_active, %invertL130
%187 = load float, float* %"value_phi30'de", align 4
store float 0.000000e+00, float* %"value_phi30'de", align 4
%188 = load i64, i64* %"iv5'ac", align 8
%189 = icmp eq i64 %188, 0
%190 = xor i1 %189, true
%191 = select fast i1 %190, float %187, float 0.000000e+00
%192 = load float, float* %"'de", align 4
%193 = fadd fast float %192, %187
%194 = select fast i1 %189, float %192, float %193
store float %194, float* %"'de", align 4
%195 = select fast i1 %189, float %187, float 0.000000e+00
%196 = load float, float* %"value_phi30.ph'de", align 4
%197 = fadd fast float %196, %187
%198 = select fast i1 %189, float %197, float %196
store float %198, float* %"value_phi30.ph'de", align 4
br i1 %189, label %invertL130.outer, label %incinvertL130
incinvertL130: ; preds = %invertL130_amerge
%199 = load i64, i64* %"iv5'ac", align 8
%200 = add nsw i64 %199, -1
store i64 %200, i64* %"iv5'ac", align 8
br label %invertL130
invertL148: ; preds = %mergeinvertL130.outer_L178.loopexit, %incinvertL130.outer
%_unwrap102 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0
%_unwrap47 = add nsw i64 %_unwrap102, -2
%_unwrap48 = add nuw nsw i64 %_unwrap102, 1
%201 = call i64 @llvm.smax.i64(i64 %_unwrap48, i64 3), !dbg !647
%_unwrap49 = add nsw i64 %201, -3
%202 = call i64 @llvm.umin.i64(i64 %_unwrap47, i64 %_unwrap49), !dbg !647
%203 = add nuw i64 %202, 1
%_unwrap98 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1
%_unwrap51 = add i64 %_unwrap98, -2
%204 = add nuw i64 %_unwrap51, 1
%205 = mul nuw nsw i64 %204, %203
%206 = load i64**, i64*** %loopLimit_cache2, align 8, !dereferenceable !236, !invariant.group !655
%207 = load i64, i64* %"iv1'ac", align 8
%208 = load i64, i64* %"iv'ac", align 8
%209 = mul nuw nsw i64 %204, %203
%210 = mul nuw nsw i64 %208, %204
%211 = add nuw nsw i64 %207, %210
%212 = getelementptr inbounds i64*, i64** %206, i64 %211
%213 = load i64*, i64** %212, align 8, !dereferenceable !236, !invariant.group !718
%214 = load i64, i64* %"iv3'ac", align 8
%215 = getelementptr inbounds i64, i64* %213, i64 %214
%216 = load i64, i64* %215, align 8, !invariant.group !719
br label %mergeinvertL130_L148
mergeinvertL130_L148: ; preds = %invertL148
store i64 %216, i64* %"iv5'ac", align 8
br label %invertL130
invertL178.loopexit: ; preds = %invertL178
%_unwrap99 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0
%_unwrap53 = add nsw i64 %_unwrap99, -2
%_unwrap54 = add nuw nsw i64 %_unwrap99, 1
%217 = call i64 @llvm.smax.i64(i64 %_unwrap54, i64 3), !dbg !647
%_unwrap55 = add nsw i64 %217, -3
%218 = call i64 @llvm.umin.i64(i64 %_unwrap53, i64 %_unwrap55), !dbg !647
%219 = add nuw i64 %218, 1
%_unwrap95 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1
%_unwrap57 = add i64 %_unwrap95, -2
%220 = add nuw i64 %_unwrap57, 1
%221 = mul nuw nsw i64 %220, %219
%222 = load i64*, i64** %loopLimit_cache, align 8, !dereferenceable !236, !invariant.group !650
%223 = load i64, i64* %"iv1'ac", align 8
%224 = load i64, i64* %"iv'ac", align 8
%225 = mul nuw nsw i64 %220, %219
%226 = mul nuw nsw i64 %224, %220
%227 = add nuw nsw i64 %223, %226
%228 = getelementptr inbounds i64, i64* %222, i64 %227
%229 = load i64, i64* %228, align 8, !invariant.group !730
br label %mergeinvertL130.outer_L178.loopexit
mergeinvertL130.outer_L178.loopexit: ; preds = %invertL178.loopexit
store i64 %229, i64* %"iv3'ac", align 8
br label %invertL148
invertL178: ; preds = %mergeinvertL66_L187.loopexit, %incinvertL66
%230 = load i64, i64* %"iv1'ac", align 8, !dbg !728
%231 = load i64, i64* %"iv'ac", align 8, !dbg !728
%"'ipc_unwrap" = addrspacecast {} addrspace(10)* %"'" to float addrspace(13)* addrspace(11)*, !dbg !728
%"arrayptr5482'ipl_unwrap" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc_unwrap", align 16, !alias.scope !663, !noalias !666, !invariant.group !668
%iv.next2_unwrap = add nuw nsw i64 %230, 1, !dbg !728
%_unwrap100 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0, !dbg !728
%_unwrap61 = add nsw i64 %_unwrap100, -2, !dbg !728
%_unwrap62 = add nuw nsw i64 %_unwrap100, 1, !dbg !728
%232 = call i64 @llvm.smax.i64(i64 %_unwrap62, i64 3), !dbg !647
%_unwrap63 = add nsw i64 %232, -3, !dbg !728
%233 = call i64 @llvm.umin.i64(i64 %_unwrap61, i64 %_unwrap63), !dbg !647
%234 = add nuw i64 %233, 1, !dbg !728
%235 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 2, !dbg !728
%236 = getelementptr inbounds i64, i64* %235, i64 %231, !dbg !728
%237 = load i64, i64* %236, align 8, !dbg !728, !invariant.group !749
%_unwrap64 = add i64 %iv.next2_unwrap, %237, !dbg !728
%"'ipg58_unwrap" = getelementptr inbounds float, float addrspace(13)* %"arrayptr5482'ipl_unwrap", i64 %_unwrap64, !dbg !728
%238 = load float, float addrspace(13)* %"'ipg58_unwrap", align 4, !dbg !728, !tbaa !134, !alias.scope !750, !noalias !753
store float 0.000000e+00, float addrspace(13)* %"'ipg58_unwrap", align 4, !dbg !728, !tbaa !134, !alias.scope !750, !noalias !753
%239 = load float, float* %"'de65", align 4, !dbg !728
%240 = fadd fast float %239, %238, !dbg !728
store float %240, float* %"'de65", align 4, !dbg !728
%241 = load float, float* %"'de65", align 4, !dbg !726
store float 0.000000e+00, float* %"'de65", align 4, !dbg !726
%242 = fmul fast float %241, 5.000000e-01, !dbg !726
%243 = load float, float* %"value_phi46'de", align 4, !dbg !726
%244 = fadd fast float %243, %242, !dbg !726
store float %244, float* %"value_phi46'de", align 4, !dbg !726
%245 = load float, float* %"value_phi46'de", align 4
store float 0.000000e+00, float* %"value_phi46'de", align 4
%246 = load i64, i64* %"iv1'ac", align 8
%247 = load i64, i64* %"iv'ac", align 8
%_unwrap68 = add nuw i64 %247, 2
%_unwrap69 = shl nuw i64 %_unwrap68, 1
%_unwrap70 = add i64 %_unwrap69, -2
%.not77_unwrap = icmp eq i64 %2, 2
%_unwrap71 = select i1 %.not77_unwrap, i64 -2, i64 -1
%_unwrap72 = add i64 %_unwrap69, %_unwrap71
%.not79_unwrap = icmp sgt i64 %_unwrap70, %_unwrap72
%_unwrap73 = add i64 %_unwrap69, -3
%value_phi16_unwrap = select i1 %.not79_unwrap, i64 %_unwrap73, i64 %_unwrap72
%_unwrap74 = icmp sgt i64 %_unwrap70, %value_phi16_unwrap
%_unwrap75 = shl nuw i64 %246, 1
%_unwrap76 = add nuw i64 %_unwrap75, 2
%.not76_unwrap = icmp eq i64 %2, 1
%_unwrap77 = select i1 %.not76_unwrap, i64 2, i64 3
%_unwrap78 = add nuw i64 %_unwrap77, %_unwrap75
%.not78_unwrap = icmp sgt i64 %_unwrap76, %_unwrap78
%_unwrap79 = or i64 %_unwrap75, 1
%value_phi15_unwrap = select i1 %.not78_unwrap, i64 %_unwrap79, i64 %_unwrap78
%_unwrap80 = icmp sgt i64 %_unwrap76, %value_phi15_unwrap
%not._unwrap = or i1 %_unwrap74, %_unwrap80
%248 = xor i1 %not._unwrap, true
%249 = select fast i1 %248, float %245, float 0.000000e+00
%250 = load float, float* %"'de", align 4
%251 = fadd fast float %250, %245
%252 = select fast i1 %not._unwrap, float %250, float %251
store float %252, float* %"'de", align 4
br i1 %not._unwrap, label %invertL66, label %invertL178.loopexit
invertL187.loopexit: ; preds = %invertL187
%253 = load i64, i64* %"iv'ac", align 8
%254 = load i64, i64* %_cache81, align 8, !invariant.group !651
%_unwrap82 = add i64 %254, -2
br label %mergeinvertL66_L187.loopexit
mergeinvertL66_L187.loopexit: ; preds = %invertL187.loopexit
store i64 %_unwrap82, i64* %"iv1'ac", align 8
br label %invertL178
invertL187: ; preds = %mergeinvertL36_L208.loopexit, %incinvertL36
%255 = load i64, i64* %"iv'ac", align 8
%256 = load i64, i64* %_cache81, align 8, !invariant.group !651
%_unwrap83 = icmp ult i64 %256, 2
br i1 %_unwrap83, label %invertL36, label %invertL187.loopexit
invertL208.loopexit: ; preds = %invertL208
%_unwrap84 = add nsw i64 %5, -2
%_unwrap85 = add nuw nsw i64 %5, 1
%257 = call i64 @llvm.smax.i64(i64 %_unwrap85, i64 3), !dbg !647
%_unwrap86 = add nsw i64 %257, -3
%258 = call i64 @llvm.umin.i64(i64 %_unwrap84, i64 %_unwrap86), !dbg !647
br label %mergeinvertL36_L208.loopexit
mergeinvertL36_L208.loopexit: ; preds = %invertL208.loopexit
store i64 %258, i64* %"iv'ac", align 8
br label %invertL187
invertL208: ; preds = %L208
br i1 %6, label %inverttop, label %invertL208.loopexit
}
LLVM.LoadInst(%unbox = load i64, i64 addrspace(11)* %11, align 8, !dbg !28, !tbaa !16, !alias.scope !34, !noalias !37)
LLVM.PHIInst(%unbox_replacementA = phi i64 , !dbg !21)
Stacktrace:
[1] -
@ ./int.jl:86
[2] #127
@ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:29
[3] map
@ ./tuple.jl:292
[4] macro expansion
@ ./simdloop.jl:69
[5] ##kern#421#131
@ ~/git/Enzyme.jl/WaterLily.jl/src/util.jl:103
Stacktrace:
[1] julia_error(cstr::Cstring, val::Ptr{LLVM.API.LLVMOpaqueValue}, errtype::Enzyme.API.ErrorType, data::Ptr{Nothing}, data2::Ptr{LLVM.API.LLVMOpaqueValue}, B::Ptr{LLVM.API.LLVMOpaqueBuilder})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:2713
[2] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{Enzyme.API.CDIFFE_TYPE}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, runtimeActivity::Bool, width::Int64, additionalArg::Ptr{LLVM.API.LLVMOpaqueType}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{Bool}, augmented::Ptr{Nothing}, atomicAdd::Bool)
@ Enzyme.API ~/git/Enzyme.jl/src/api.jl:253
[3] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::NTuple{5, Bool}, returnPrimal::Bool, expectedTapeType::Type, loweredArgs::Set{Int64}, boxedArgs::Set{Int64})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:5058
[4] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8191
[5] codegen
@ ~/git/Enzyme.jl/src/compiler.jl:7028 [inlined]
[6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:9299
[7] _thunk
@ ~/git/Enzyme.jl/src/compiler.jl:9299 [inlined]
[8] cached_compilation
@ ~/git/Enzyme.jl/src/compiler.jl:9340 [inlined]
[9] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{0x0000000000007b3e}, ::Type{Const{typeof(Core.kwcall)}}, ::Type{Const{Nothing}}, tt::Type{Tuple{Const{@NamedTuple{perdir::Tuple{}}}, Const{typeof(WaterLily.restrictL!)}, Duplicated{Array{Float32, 3}}, Duplicated{Array{Float32, 3}}}}, ::Val{Enzyme.API.DEM_ReverseModePrimal}, ::Val{1}, ::Val{(true, true, true, true, true)}, ::Val{true}, ::Val{false}, ::Type{FFIABI}, ::Val{false}, ::Val{true})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:9472
[10] #s2067#19669
@ ~/git/Enzyme.jl/src/compiler.jl:9609 [inlined]
[11] var"#s2067#19669"(FA::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, ReturnPrimal::Any, ShadowInit::Any, World::Any, ABI::Any, ErrIfFuncWritten::Any, RuntimeActivity::Any, ::Any, ::Type, ::Type, ::Type, tt::Any, ::Type, ::Type, ::Type, ::Type, ::Type, ::Type, ::Type, ::Any)
@ Enzyme.Compiler ./none:0
[12] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
@ Core ./boot.jl:602
[13] runtime_generic_augfwd(activity::Type{Val{(false, false, false, true, true)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true, true, true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::typeof(Core.kwcall), df::Nothing, primal_1::@NamedTuple{perdir::Tuple{}}, shadow_1_1::@NamedTuple{perdir::Tuple{}}, primal_2::typeof(WaterLily.restrictL!), shadow_2_1::Nothing, primal_3::Array{Float32, 3}, shadow_3_1::Array{Float32, 3}, primal_4::Array{Float32, 3}, shadow_4_1::Array{Float32, 3})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:468
[14] restrictML
@ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:23 [inlined]
[15] restrictML
@ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0 [inlined]
[16] augmented_julia_restrictML_9751_inner_1wrap
@ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0
[17] macro expansion
@ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
[18] enzyme_call
@ ~/git/Enzyme.jl/src/compiler.jl:8795 [inlined]
[19] AugmentedForwardThunk
@ ~/git/Enzyme.jl/src/compiler.jl:8632 [inlined]
[20] runtime_generic_augfwd(activity::Type{Val{(false, true)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::typeof(WaterLily.restrictML), df::Nothing, primal_1::Poisson{Float32, Matrix{Float32}, Array{Float32, 3}}, shadow_1_1::Poisson{Float32, Matrix{Float32}, Array{Float32, 3}})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:483
[21] _
@ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:54
[22] MultiLevelPoisson
@ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:51 [inlined]
[23] MultiLevelPoisson
@ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0 [inlined]
[24] augmented_julia_MultiLevelPoisson_9175_inner_1wrap
@ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0
[25] macro expansion
@ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
[26] enzyme_call(::Val{false}, ::Ptr{Nothing}, ::Type{Enzyme.Compiler.AugmentedForwardThunk}, ::Val{1}, ::Val{true}, ::Type{Tuple{Const{@NamedTuple{perdir::Tuple{}}}, Const{Type{MultiLevelPoisson}}, Duplicated{Matrix{Float32}}, Duplicated{Array{Float32, 3}}, Duplicated{Matrix{Float32}}}}, ::Type{Duplicated{MultiLevelPoisson{Float32, Matrix{Float32}, Array{Float32, 3}}}}, ::Const{typeof(Core.kwcall)}, ::Type{@NamedTuple{1::@NamedTuple{1, 2, 3::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 4, 5, 6::@NamedTuple{1, 2, 3, 4, 5::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{UInt64, 0}}, 6, 7, 8, 9, 10, 11::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{Float32, 0}}, 12, 13, 14, 15::Bool, 16::Bool}, 7, 8, 9, 10, 11, 12::UInt64, 13::UInt64, 14::UInt64}, 4, 5::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 6, 7, 8::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 3, 4::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 5, 6, 7, 8, 9::Bool, 10, 11}, 4::UInt64, 5::UInt64}, 9, 10, 11, 12, 13, 14, 15, 16::Bool, 17::Core.LLVMPtr{UInt64, 0}, 18, 19::Core.LLVMPtr{Bool, 0}, 20::Core.LLVMPtr{UInt64, 0}, 21, 22::Core.LLVMPtr{Bool, 0}}, 2}}, ::Const{@NamedTuple{perdir::Tuple{}}}, ::Const{Type{MultiLevelPoisson}}, ::Duplicated{Matrix{Float32}}, ::Duplicated{Array{Float32, 3}}, ::Duplicated{Matrix{Float32}})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8795
[27] (::Enzyme.Compiler.AugmentedForwardThunk{Ptr{Nothing}, Const{typeof(Core.kwcall)}, Duplicated{MultiLevelPoisson{Float32, Matrix{Float32}, Array{Float32, 3}}}, Tuple{Const{@NamedTuple{perdir::Tuple{}}}, Const{Type{MultiLevelPoisson}}, Duplicated{Matrix{Float32}}, Duplicated{Array{Float32, 3}}, Duplicated{Matrix{Float32}}}, 1, true, @NamedTuple{1::@NamedTuple{1, 2, 3::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 4, 5, 6::@NamedTuple{1, 2, 3, 4, 5::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{UInt64, 0}}, 6, 7, 8, 9, 10, 11::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{Float32, 0}}, 12, 13, 14, 15::Bool, 16::Bool}, 7, 8, 9, 10, 11, 12::UInt64, 13::UInt64, 14::UInt64}, 4, 5::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 6, 7, 8::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 3, 4::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 5, 6, 7, 8, 9::Bool, 10, 11}, 4::UInt64, 5::UInt64}, 9, 10, 11, 12, 13, 14, 15, 16::Bool, 17::Core.LLVMPtr{UInt64, 0}, 18, 19::Core.LLVMPtr{Bool, 0}, 20::Core.LLVMPtr{UInt64, 0}, 21, 22::Core.LLVMPtr{Bool, 0}}, 2}})(::Const{typeof(Core.kwcall)}, ::Const{@NamedTuple{perdir::Tuple{}}}, ::Vararg{Any})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8632
[28] runtime_generic_augfwd(activity::Type{Val{(false, false, false, true, true, true)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true, true, true, true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::typeof(Core.kwcall), df::Nothing, primal_1::@NamedTuple{perdir::Tuple{}}, shadow_1_1::Nothing, primal_2::Type{MultiLevelPoisson}, shadow_2_1::Nothing, primal_3::Matrix{Float32}, shadow_3_1::Matrix{Float32}, primal_4::Array{Float32, 3}, shadow_4_1::Array{Float32, 3}, primal_5::Matrix{Float32}, shadow_5_1::Matrix{Float32})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:483
[29] _
@ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:76 [inlined]
[30] _
@ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:0 [inlined]
[31] augmented_julia___270_5848_inner_1wrap
@ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:0
[32] macro expansion
@ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
[33] enzyme_call(::Val{false}, ::Ptr{Nothing}, ::Type{Enzyme.Compiler.AugmentedForwardThunk}, ::Val{1}, ::Val{true}, ::Type{Tuple{Const{Float64}, Active{Float64}, Const{Nothing}, Const{Nothing}, Const{Int64}, Const{Tuple{}}, Const{Nothing}, Const{Bool}, Active{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, Const{Type{Float32}}, Const{Type{Array}}, Const{Type{Simulation}}, Const{Tuple{Int64, Int64}}, Const{Tuple{Int64, Int64}}, Const{Int64}}}, ::Type{Duplicated{Simulation}}, ::Const{WaterLily.var"#_#270#274"}, ::Type{@NamedTuple{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38::Core.LLVMPtr{UInt8, 0}, 39::Core.LLVMPtr{UInt8, 0}, 40::Core.LLVMPtr{UInt8, 0}}}, ::Const{Float64}, ::Active{Float64}, ::Const{Nothing}, ::Const{Nothing}, ::Const{Int64}, ::Const{Tuple{}}, ::Const{Nothing}, ::Const{Bool}, ::Active{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, ::Const{Type{Float32}}, ::Const{Type{Array}}, ::Const{Type{Simulation}}, ::Const{Tuple{Int64, Int64}}, ::Const{Tuple{Int64, Int64}}, ::Const{Int64})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8795
[34] (::Enzyme.Compiler.AugmentedForwardThunk{Ptr{Nothing}, Const{WaterLily.var"#_#270#274"}, Duplicated{Simulation}, Tuple{Const{Float64}, Active{Float64}, Const{Nothing}, Const{Nothing}, Const{Int64}, Const{Tuple{}}, Const{Nothing}, Const{Bool}, Active{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, Const{Type{Float32}}, Const{Type{Array}}, Const{Type{Simulation}}, Const{Tuple{Int64, Int64}}, Const{Tuple{Int64, Int64}}, Const{Int64}}, 1, true, @NamedTuple{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38::Core.LLVMPtr{UInt8, 0}, 39::Core.LLVMPtr{UInt8, 0}, 40::Core.LLVMPtr{UInt8, 0}}})(::Const{WaterLily.var"#_#270#274"}, ::Const{Float64}, ::Vararg{Any})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8632
[35] runtime_generic_augfwd(activity::Type{Val{(false, false, true, false, false, false, false, false, false, true, false, false, false, false, false, false)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::WaterLily.var"#_#270#274", df::Nothing, primal_1::Float64, shadow_1_1::Nothing, primal_2::Float64, shadow_2_1::Base.RefValue{Float64}, primal_3::Nothing, shadow_3_1::Nothing, primal_4::Nothing, shadow_4_1::Nothing, primal_5::Int64, shadow_5_1::Nothing, primal_6::Tuple{}, shadow_6_1::Nothing, primal_7::Nothing, shadow_7_1::Nothing, primal_8::Bool, shadow_8_1::Nothing, primal_9::AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, shadow_9_1::Base.RefValue{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, primal_10::Type{Float32}, shadow_10_1::Nothing, primal_11::Type{Array}, shadow_11_1::Nothing, primal_12::Type{Simulation}, shadow_12_1::Nothing, primal_13::Tuple{Int64, Int64}, shadow_13_1::Nothing, primal_14::Tuple{Int64, Int64}, shadow_14_1::Nothing, primal_15::Int64, shadow_15_1::Nothing)
@ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:483
[36] Simulation
@ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:65 [inlined]
[37] #make_foils#1
@ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:24
[38] make_foils
@ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:3 [inlined]
[39] mean_drag (repeats 5 times)
@ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:39
[40] f
@ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:49 [inlined]
[41] augmented_julia_f_2861wrap
@ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:0
[42] macro expansion
@ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
[43] enzyme_call
@ ~/git/Enzyme.jl/src/compiler.jl:8795 [inlined]
[44] AugmentedForwardThunk
@ ~/git/Enzyme.jl/src/compiler.jl:8632 [inlined]
[45] autodiff
@ ~/git/Enzyme.jl/src/Enzyme.jl:384 [inlined]
[46] autodiff
@ ~/git/Enzyme.jl/src/Enzyme.jl:512 [inlined]
[47] g!
@ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:59 [inlined]
[48] macro expansion
@ ./timing.jl:279 [inlined]
[49] top-level scope
@ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:269
https://fwd.gymni.ch/AkG87D
https://fwd.gymni.ch/Nw55G8
https://fwd.gymni.ch/xFoqVm
Now with https://github.com/EnzymeAD/Enzyme/pull/2089 it now hits https://github.com/EnzymeAD/Enzyme.jl/issues/1781
Hey, thanks for working on this! I have followed the thread of fixes and I understand you managed to fix them all? Does KA need to be updated before I can try to run our example in WaterLily using Enzyme in GPU simulations?
You may need https://github.com/JuliaGPU/KernelAbstractions.jl/pull/534 as well. However locally running while the above issue is fixed there was still a strange memory issue going awry with a full model. @b-fg if you're able to reduce any remaining issues to a MWE, we can try to get them fixed