Enzyme.jl icon indicating copy to clipboard operation
Enzyme.jl copied to clipboard

Cannot deduce type of copy `call void @llvm.memcpy.p10i8.p0i8.i64`

Open vchuravy opened this issue 1 year ago • 4 comments

Reproducer:

git clone https://github.com/vchuravy/WaterLily.jl
cd WaterLily.jl/examples
git checkout vc/enzyme
# instantiate local project
julia +1.10 --project=. TandemFoilOptim.jl
ERROR: LoadError: Enzyme execution failed.
Enzyme cannot deduce type
Current scope: 
; Function Attrs: mustprogress willreturn
define internal fastcc void @preprocess_julia__make_foils_1_2261([6 x {} addrspace(10)*]* noalias nocapture nofree noundef nonnull writeonly sret([6 x {} addrspace(10)*]) align
 8 dereferenceable(48) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,16]:Pointer, [-1,32]:Pointer}" %0, float "enzyme_type"="{[-1]:Float@float}" "enzymejl_p
armtype"="138083780338720" "enzymejl_parmtype_ref"="0" %1) unnamed_addr #42 !dbg !725 {
; ...

Cannot deduce type of copy   call void @llvm.memcpy.p10i8.p0i8.i64(i8 addrspace(10)* noundef align 1 dereferenceable(7) %newstruct31.sroa.3.sroa.2.0.newstruct31.sroa.3.0..sroa_
raw_idx.sroa_raw_idx, i8* noundef nonnull align 1 dereferenceable(7) %newstruct31.sroa.3.sroa.2.1.newstruct24.sroa.3.0.sroa_idx.sroa_idx, i64 noundef 7, i1 noundef false) #44, 
!dbg !85

Caused by:
Stacktrace:
 [1] Simulation
   @ ~/src/WaterLily/src/WaterLily.jl:65
 [2] #make_foils#1
   @ ~/src/WaterLily/examples/TandemFoilOptim.jl:24

Full log: https://gist.github.com/vchuravy/8e70c7ff38fd150f941fef6a7af6cc92

vchuravy avatar Jun 20 '24 00:06 vchuravy

The problem is that this type doesn't have any ino when taking a typetree of it between bytes 16 and 24.


  %box34 = call noalias nonnull dereferenceable(240) "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Float@float, [-1,40]:Float@double, [-1,48]:Integer, [-1,49]:Integer, [-1,50]:Integer, [-1,51]:Integer, [-1,52]:Integer, [-1,53]:Integer, [-1,54]:Integer, [-1,55]:Integer, [-1,56]:Float@double, [-1,64]:Integer, [-1,65]:Integer, [-1,66]:Integer, [-1,67]:Integer, [-1,68]:Integer, [-1,69]:Integer, [-1,70]:Integer, [-1,71]:Integer, [-1,72]:Integer, [-1,73]:Integer, [-1,74]:Integer, [-1,75]:Integer, [-1,76]:Integer, [-1,77]:Integer, [-1,78]:Integer, [-1,79]:Integer, [-1,80]:Integer, [-1,81]:Integer, [-1,82]:Integer, [-1,83]:Integer, [-1,84]:Integer, [-1,85]:Integer, [-1,86]:Integer, [-1,87]:Integer, [-1,88]:Integer, [-1,89]:Integer, [-1,90]:Integer, [-1,91]:Integer, [-1,92]:Integer, [-1,93]:Integer, [-1,94]:Integer, [-1,95]:Integer, [-1,96]:Integer, [-1,97]:Integer, [-1,98]:Integer, [-1,99]:Integer, [-1,100]:Integer, [-1,101]:Integer, [-1,102]:Integer, [-1,103]:Integer, [-1,104]:Integer, [-1,105]:Integer, [-1,106]:Integer, [-1,107]:Integer, [-1,108]:Integer, [-1,109]:Integer, [-1,110]:Integer, [-1,111]:Integer, [-1,112]:Integer, [-1,113]:Integer, [-1,114]:Integer, [-1,115]:Integer, [-1,116]:Integer, [-1,117]:Integer, [-1,118]:Integer, [-1,119]:Integer, [-1,120]:Integer, [-1,121]:Integer, [-1,122]:Integer, [-1,123]:Integer, [-1,124]:Integer, [-1,125]:Integer, [-1,126]:Integer, [-1,127]:Integer, [-1,128]:Integer, [-1,136]:Integer, [-1,137]:Integer, [-1,138]:Integer, [-1,139]:Integer, [-1,140]:Integer, [-1,141]:Integer, [-1,142]:Integer, [-1,143]:Integer, [-1,144]:Float@float, [-1,152]:Float@double, [-1,160]:Integer, [-1,161]:Integer, [-1,162]:Integer, [-1,163]:Integer, [-1,164]:Integer, [-1,165]:Integer, [-1,166]:Integer, [-1,167]:Integer, [-1,168]:Float@double, [-1,176]:Integer, [-1,177]:Integer, [-1,178]:Integer, [-1,179]:Integer, [-1,180]:Integer, [-1,181]:Integer, [-1,182]:Integer, [-1,183]:Integer, [-1,184]:Integer, [-1,185]:Integer, [-1,186]:Integer, [-1,187]:Integer, [-1,188]:Integer, [-1,189]:Integer, [-1,190]:Integer, [-1,191]:Integer, [-1,192]:Integer, [-1,193]:Integer, [-1,194]:Integer, [-1,195]:Integer, [-1,196]:Integer, [-1,197]:Integer, [-1,198]:Integer, [-1,199]:Integer, [-1,200]:Integer, [-1,201]:Integer, [-1,202]:Integer, [-1,203]:Integer, [-1,204]:Integer, [-1,205]:Integer, [-1,206]:Integer, [-1,207]:Integer, [-1,208]:Integer, [-1,209]:Integer, [-1,210]:Integer, [-1,211]:Integer, [-1,212]:Integer, [-1,213]:Integer, [-1,214]:Integer, [-1,215]:Integer, [-1,216]:Integer, [-1,217]:Integer, [-1,218]:Integer, [-1,219]:Integer, [-1,220]:Integer, [-1,221]:Integer, [-1,222]:Integer, [-1,223]:Integer, [-1,224]:Integer, [-1,225]:Integer, [-1,226]:Integer, [-1,227]:Integer, [-1,228]:Integer, [-1,229]:Integer, [-1,230]:Integer, [-1,231]:Integer, [-1,232]:Integer, [-1,233]:Integer, [-1,234]:Integer, [-1,235]:Integer, [-1,236]:Integer, [-1,237]:Integer, [-1,238]:Integer, [-1,239]:Integer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 240, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 137776915697616 to {}*) to {} addrspace(10)*)) #46, !dbg !757
  %35 = bitcast {} addrspace(10)* %box34 to i8 addrspace(10)*, !dbg !757


julia> obj(137776915697616)
AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}

wsmoses avatar Jun 21 '24 02:06 wsmoses

okay I'm deeply confused by this memcpy of 7 bytes. Why is this happening. where does it come from?

wsmoses avatar Jun 21 '24 02:06 wsmoses

logs of relevance so we don't need to rerun:

julia> obj(x) = Base.unsafe_pointer_to_objref(Base.reinterpret(Ptr{Cvoid}, x))
obj (generic function with 1 method)

julia> obj(137776915697616)
AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}

julia> T =obj(137776915697616)
AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}

julia> fieldtypes(T)
(WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}})

julia> fieldoffsets(T)
ERROR: UndefVarError: `fieldoffsets` not defined
Stacktrace:
 [1] top-level scope
   @ REPL[10]:1

julia> fieldoffset(T, 1)
0x0000000000000000

julia> fieldoffset(T, 2)
0x0000000000000080

julia> T = fieldtypes(T)[1]
WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}

julia> fieldtypes(T)
(Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}})

julia> fieldtypes(T, 1)
ERROR: MethodError: no method matching fieldtypes(::Type{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, ::Int64)

Closest candidates are:
  fieldtypes(::Type)
   @ Base reflection.jl:919

Stacktrace:
 [1] top-level scope
   @ REPL[15]:1

julia> fieldoffset(T, 1)
0x0000000000000000

julia> fieldoffset(T, 2)
0x0000000000000008

julia> fieldoffset(T, 3)
0x0000000000000010

julia> fieldoffset(T, 4)
ERROR: BoundsError: attempt to access DataType at index [4]
Stacktrace:
 [1] fieldoffset(x::DataType, idx::Int64)
   @ Base ./reflection.jl:779
 [2] top-level scope
   @ REPL[19]:1

julia> S = fieldtype(T, 3)
var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}

julia> size(S)
ERROR: MethodError: no method matching size(::Type{var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}})

Closest candidates are:
  size(::LLVM.FunctionBlockSet)
   @ LLVM ~/.julia/packages/LLVM/6cDbl/src/core/function.jl:129
  size(::BitVector)
   @ Base bitarray.jl:104
  size(::BitVector, ::Integer)
   @ Base bitarray.jl:107
  ...

Stacktrace:
 [1] top-level scope
   @ REPL[21]:1

julia> sizeof(S)
112

julia> using LLVM
 │ Package LLVM not found, but a package named LLVM is available from a registry. 
 │ Install package?
 │   (examples) pkg> add LLVM 
 └ (y/n/o) [y]: y
   Resolving package versions...
    Updating `~/git/Enzyme.jl/WaterLily.jl/examples/Project.toml`
  [929cbde3] + LLVM v7.2.1
  No Changes to `~/git/Enzyme.jl/WaterLily.jl/examples/Manifest.toml`
Precompiling project...
  ✗ GLMakie
  74 dependencies successfully precompiled in 59 seconds. 296 already precompiled.
  3 dependencies had output during precompilation:
┌ WaterLily → WaterLilyWriteVTKExt
│  ┌ Warning: 
│  │ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│  │ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
│  └ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
└  
┌ WaterLily → WaterLilyCUDAExt
│  ┌ Warning: 
│  │ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│  │ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
│  └ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
└  
┌ WaterLily → WaterLilyReadVTKExt
│  ┌ Warning: 
│  │ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│  │ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
│  └ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
└  
  1 dependency errored.
  For a report of the errors see `julia> err`. To retry use `pkg> precompile`

julia> ctx = LLVM.Context()
LLVM.Context(0x0000000005bc7470, typed ptrs)

julia> tt(T) = string(Enzyme.typetree(T, ctx, ""))
tt (generic function with 1 method)

julia> tt(S)
"{[0]:Integer, [8]:Integer, [9]:Integer, [10]:Integer, [11]:Integer, [12]:Integer, [13]:Integer, [14]:Integer, [15]:Integer, [16]:Float@float, [24]:Float@double, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer, [40]:Float@double, [48]:Integer, [49]:Integer, [50]:Integer, [51]:Integer, [52]:Integer, [53]:Integer, [54]:Integer, [55]:Integer, [56]:Integer, [57]:Integer, [58]:Integer, [59]:Integer, [60]:Integer, [61]:Integer, [62]:Integer, [63]:Integer, [64]:Integer, [65]:Integer, [66]:Integer, [67]:Integer, [68]:Integer, [69]:Integer, [70]:Integer, [71]:Integer, [72]:Integer, [73]:Integer, [74]:Integer, [75]:Integer, [76]:Integer, [77]:Integer, [78]:Integer, [79]:Integer, [80]:Integer, [81]:Integer, [82]:Integer, [83]:Integer, [84]:Integer, [85]:Integer, [86]:Integer, [87]:Integer, [88]:Integer, [89]:Integer, [90]:Integer, [91]:Integer, [92]:Integer, [93]:Integer, [94]:Integer, [95]:Integer, [96]:Integer, [97]:Integer, [98]:Integer, [99]:Integer, [100]:Integer, [101]:Integer, [102]:Integer, [103]:Integer, [104]:Integer, [105]:Integer, [106]:Integer, [107]:Integer, [108]:Integer, [109]:Integer, [110]:Integer, [111]:Integer}"

julia> fieldtypes(S)
(Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}})

julia> fieldoffset(S, 1)
0x0000000000000000

julia> fieldoffset(S, 2)
0x0000000000000008

julia> fieldoffset(S, 3)
0x0000000000000010

julia> fieldoffset(S, 4)
0x0000000000000018

julia> Int(fieldoffset(S, 4))
24

julia> Int(fieldoffset(S, 3))
16

julia> fieldtypes(S)[3]
Float32

wsmoses avatar Jun 21 '24 02:06 wsmoses

Why is this happening. where does it come from?

This is likely LLVM optimizing a copy loop? But why 7 and not 9 I do not know.

vchuravy avatar Jul 17 '24 12:07 vchuravy

okay I've fixed the actual issues from this issue at hand.

However now it.....segfaults

wsmoses avatar Sep 28 '24 19:09 wsmoses

This is now resolved on main, both original error and segfault. The total code doesn't run however due to Enzyme's cache algorithm getting confused:

(base) wmoses-macbookpro2:examples wmoses$ julia --project=. TandemFoilOptim.jl 
┌ Warning: 
│ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
└ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
ERROR: LoadError: Enzyme compilation failed.
Current scope: 
; Function Attrs: mustprogress nofree willreturn
define internal fastcc void @preprocess_julia___kern_421_131_9980({ [2 x i64], {} addrspace(10)* } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(24) "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Pointer, [-1,16,-1]:Pointer}" "enzymejl_parmtype"="5475090832" "enzymejl_parmtype_ref"="1" %0, {} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %1, i64 signext "enzyme_inactive" "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="4855344304" "enzymejl_parmtype_ref"="0" %2, {} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %3) unnamed_addr #22 !dbg !265 {
top:
  %4 = call {}*** @julia.get_pgcstack() #26
  %ptls_field72 = getelementptr inbounds {}**, {}*** %4, i64 2
  %5 = bitcast {}*** %ptls_field72 to i64***
  %ptls_load7374 = load i64**, i64*** %5, align 8, !tbaa !12
  %6 = getelementptr inbounds i64*, i64** %ptls_load7374, i64 2
  %safepoint = load i64*, i64** %6, align 8, !tbaa !16
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(i64* %safepoint) #26, !dbg !266
  fence syncscope("singlethread") seq_cst
  %7 = getelementptr inbounds { [2 x i64], {} addrspace(10)* }, { [2 x i64], {} addrspace(10)* } addrspace(11)* %0, i64 0, i32 0, i64 1, !dbg !267
  %unbox2 = load i64, i64 addrspace(11)* %7, align 8, !dbg !271, !tbaa !16, !alias.scope !64, !noalias !65, !enzyme_inactive !0
  %8 = add i64 %unbox2, -1, !dbg !271
  %9 = call i64 @llvm.smax.i64(i64 %8, i64 noundef 1) #26, !dbg !273
  %10 = icmp ult i64 %9, 2, !dbg !276
  br i1 %10, label %L208, label %L36.preheader, !dbg !280

L36.preheader:                                    ; preds = %top
  %11 = getelementptr inbounds { [2 x i64], {} addrspace(10)* }, { [2 x i64], {} addrspace(10)* } addrspace(11)* %0, i64 0, i32 0, i64 0, !dbg !267
  %unbox = load i64, i64 addrspace(11)* %11, align 8, !dbg !271, !tbaa !16, !alias.scope !64, !noalias !65
  %12 = add i64 %unbox, -1, !dbg !271
  %13 = call i64 @llvm.smax.i64(i64 %12, i64 noundef 1) #26, !dbg !281
  %14 = icmp ult i64 %13, 2
  %.not76 = icmp eq i64 %2, 1
  %.not77 = icmp eq i64 %2, 2
  %15 = select i1 %.not77, i64 -2, i64 -1
  %.phi.trans.insert64 = addrspacecast {} addrspace(10)* %3 to {} addrspace(10)* addrspace(11)*
  %arraysize_ptr.phi.trans.insert = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %.phi.trans.insert64, i64 3
  %.phi.trans.insert65 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr.phi.trans.insert to i64 addrspace(11)*
  %arraysize_ptr31.phi.trans.insert = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %.phi.trans.insert64, i64 4
  %.phi.trans.insert68 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr31.phi.trans.insert to i64 addrspace(11)*
  %16 = addrspacecast {} addrspace(10)* %3 to float addrspace(13)* addrspace(11)*
  %17 = add i64 %2, -1
  %18 = addrspacecast {} addrspace(10)* %1 to {} addrspace(10)* addrspace(11)*
  %arraysize_ptr47 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %18, i64 3
  %19 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr47 to i64 addrspace(11)*
  %arraysize_ptr50 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %18, i64 4
  %20 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr50 to i64 addrspace(11)*
  %21 = addrspacecast {} addrspace(10)* %1 to float addrspace(13)* addrspace(11)*
  %22 = select i1 %.not76, i64 2, i64 3
  %23 = add nsw i64 %13, -2
  br label %L36, !dbg !282

L36:                                              ; preds = %L187, %L36.preheader
  %iv = phi i64 [ %iv.next, %L187 ], [ 0, %L36.preheader ]
  %24 = shl nuw i64 %iv, 1, !dbg !282
  %25 = add i64 %24, 2, !dbg !282
  %26 = add nuw i64 %iv, 2, !dbg !282
  %iv.next = add nuw nsw i64 %iv, 1, !dbg !282
  br i1 %14, label %L187, label %L47.lr.ph, !dbg !282

L47.lr.ph:                                        ; preds = %L36
  %27 = shl nuw i64 %26, 1
  %28 = add i64 %27, -2
  %29 = add i64 %27, %15
  %.not79 = icmp sgt i64 %28, %29
  %30 = add i64 %27, -3
  %value_phi16 = select i1 %.not79, i64 %30, i64 %29
  %31 = icmp sgt i64 %28, %value_phi16
  %arraysize.pre = load i64, i64 addrspace(11)* %.phi.trans.insert65, align 8, !enzyme_inactive !0
  %arraysize32.pre = load i64, i64 addrspace(11)* %.phi.trans.insert68, align 16, !enzyme_inactive !0
  %arrayptr.pre80 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %16, align 16
  %32 = mul i64 %arraysize32.pre, %17
  %33 = add i64 %32, -1
  %arraysize48 = load i64, i64 addrspace(11)* %19, align 8, !enzyme_inactive !0
  %34 = add nsw i64 %26, -1
  %arraysize51 = load i64, i64 addrspace(11)* %20, align 16, !enzyme_inactive !0
  %arrayptr5482 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %21, align 16
  %35 = mul i64 %arraysize51, %17
  %reass.add85 = add i64 %34, %35
  %reass.mul86 = mul i64 %reass.add85, %arraysize48
  br label %L66, !dbg !283

L66:                                              ; preds = %L178, %L47.lr.ph
  %iv1 = phi i64 [ %iv.next2, %L178 ], [ 0, %L47.lr.ph ]
  %36 = shl nuw i64 %iv1, 1, !dbg !284
  %37 = add i64 %36, 2, !dbg !284
  %iv.next2 = add nuw nsw i64 %iv1, 1, !dbg !284
  %38 = shl nuw i64 %iv1, 1, !dbg !287
  %39 = add nuw i64 %38, 2, !dbg !295
  %40 = add nuw i64 %22, %38, !dbg !295
  %.not78 = icmp sgt i64 %39, %40, !dbg !298
  %41 = or i64 %38, 1, !dbg !300
  %value_phi15 = select i1 %.not78, i64 %41, i64 %40, !dbg !300
  %42 = icmp sgt i64 %39, %value_phi15, !dbg !306
  %not. = or i1 %31, %42, !dbg !309
  br i1 %not., label %L178, label %L130.outer.preheader, !dbg !292

L130.outer.preheader:                             ; preds = %L66
  br label %L130.outer, !dbg !310

L130.outer:                                       ; preds = %L130.outer.preheader, %L148
  %iv3 = phi i64 [ 0, %L130.outer.preheader ], [ %iv.next4, %L148 ]
  %value_phi30.ph = phi float [ %48, %L148 ], [ 0.000000e+00, %L130.outer.preheader ]
  %43 = add i64 %25, %iv3
  %iv.next4 = add nuw nsw i64 %iv3, 1
  %reass.add = add i64 %33, %43
  %reass.mul = mul i64 %reass.add, %arraysize.pre
  %44 = add i64 %reass.mul, -1
  br label %L130, !dbg !310

L130:                                             ; preds = %L130, %L130.outer
  %iv5 = phi i64 [ %iv.next6, %L130 ], [ 0, %L130.outer ]
  %value_phi30 = phi float [ %48, %L130 ], [ %value_phi30.ph, %L130.outer ]
  %45 = add i64 %37, %iv5, !dbg !313
  %iv.next6 = add nuw nsw i64 %iv5, 1, !dbg !313
  %46 = add i64 %44, %45, !dbg !313
  %47 = getelementptr inbounds float, float addrspace(13)* %arrayptr.pre80, i64 %46, !dbg !313
  %arrayref = load float, float addrspace(13)* %47, align 4, !dbg !313, !tbaa !134, !alias.scope !31, !noalias !34
  %48 = fadd fast float %arrayref, %value_phi30, !dbg !316
  %49 = add i64 %45, 1, !dbg !317
  %50 = icmp sgt i64 %39, %49, !dbg !319
  %51 = icmp sgt i64 %49, %value_phi15, !dbg !319
  %52 = or i1 %50, %51, !dbg !310
  %53 = icmp eq i64 %45, %value_phi15
  %or.cond = or i1 %53, %52, !dbg !310
  br i1 %or.cond, label %L148, label %L130, !dbg !310

L148:                                             ; preds = %L130
  %54 = add i64 %43, 1, !dbg !322
  %55 = icmp sle i64 %28, %54, !dbg !325
  %56 = icmp sle i64 %54, %value_phi16, !dbg !325
  %57 = and i1 %55, %56, !dbg !329
  %58 = icmp ne i64 %43, %value_phi16, !dbg !328
  %extract.t = and i1 %58, %57, !dbg !330
  br i1 %extract.t, label %L130.outer, label %L178.loopexit, !dbg !312

L178.loopexit:                                    ; preds = %L148
  br label %L178, !dbg !331

L178:                                             ; preds = %L178.loopexit, %L66
  %value_phi46 = phi float [ 0.000000e+00, %L66 ], [ %48, %L178.loopexit ]
  %59 = fmul fast float %value_phi46, 5.000000e-01, !dbg !331
  %60 = add i64 %iv.next2, %reass.mul86, !dbg !333
  %61 = getelementptr inbounds float, float addrspace(13)* %arrayptr5482, i64 %60, !dbg !333
  store float %59, float addrspace(13)* %61, align 4, !dbg !333, !tbaa !134, !alias.scope !31, !noalias !335
  %exitcond.not = icmp eq i64 %iv1, %23, !dbg !338
  br i1 %exitcond.not, label %L187.loopexit, label %L66, !dbg !283, !llvm.loop !339

L187.loopexit:                                    ; preds = %L178
  br label %L187, !dbg !340

L187:                                             ; preds = %L187.loopexit, %L36
  %62 = add nuw i64 %26, 1, !dbg !340
  %63 = icmp slt i64 %62, 2, !dbg !344
  %64 = icmp sgt i64 %62, %9, !dbg !344
  %65 = icmp eq i64 %26, %9, !dbg !347
  %not.not.84 = or i1 %63, %64, !dbg !347
  %narrow83 = or i1 %65, %not.not.84, !dbg !347
  br i1 %narrow83, label %L208.loopexit, label %L36, !dbg !343

L208.loopexit:                                    ; preds = %L187
  br label %L208, !dbg !270

L208:                                             ; preds = %L208.loopexit, %top
  ret void, !dbg !270
}

Illegal replace ficticious phi for:   %unbox_replacementA = phi i64 , !dbg !21 of   %unbox = load i64, i64 addrspace(11)* %11, align 8, !dbg !28, !tbaa !16, !alias.scope !34, !noalias !37
; Function Attrs: mustprogress nofree willreturn
define internal fastcc void @diffejulia___kern_421_131_9980({ [2 x i64], {} addrspace(10)* } addrspace(11)* nocapture nofree readonly align 8 dereferenceable(24) "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Pointer, [-1,16,-1]:Pointer}" "enzymejl_parmtype"="5475090832" "enzymejl_parmtype_ref"="1" %0, {} addrspace(10)* nocapture nofree readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %1, {} addrspace(10)* nocapture nofree align 16 "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %"'", i64 signext "enzyme_inactive" "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="4855344304" "enzymejl_parmtype_ref"="0" %2, {} addrspace(10)* nocapture nofree readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %3, {} addrspace(10)* nocapture nofree align 16 "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %"'1", { i64, i64, i64*, i64** } %tapeArg) unnamed_addr #22 !dbg !631 {
top:
  %4 = call {}*** @julia.get_pgcstack() #26
  %ptls_field72_replacementA = phi {}*** 
  %_replacementA14 = phi i64*** 
  %ptls_load7374_replacementA = phi i64** 
  %_replacementA13 = phi i64** 
  %safepoint_replacementA = phi i64* 
  %_replacementA12 = phi i64 addrspace(11)* , !dbg !632
  %unbox2_replacementA = phi i64 , !dbg !636
  %_replacementA = phi i64 , !dbg !636
  %5 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0, !dbg !638
  %6 = icmp ult i64 %5, 2, !dbg !641
  br i1 %6, label %L208, label %L36.preheader, !dbg !645

L36.preheader:                                    ; preds = %top
  %_replacementA21 = phi i64 addrspace(11)* , !dbg !632
  %unbox_replacementA = phi i64 , !dbg !636
  %_replacementA20 = phi i64 , !dbg !636
  %7 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1, !dbg !646
  %8 = icmp ult i64 %7, 2
  %.not76 = icmp eq i64 %2, 1
  %.not77 = icmp eq i64 %2, 2
  %9 = select i1 %.not77, i64 -2, i64 -1
  %.phi.trans.insert64_replacementA = phi {} addrspace(10)* addrspace(11)* 
  %arraysize_ptr.phi.trans.insert_replacementA = phi {} addrspace(10)* addrspace(11)* 
  %arraysize_ptr31.phi.trans.insert_replacementA = phi {} addrspace(10)* addrspace(11)* 
  %.phi.trans.insert68_replacementA = phi i64 addrspace(11)* 
  %"'ipc26" = addrspacecast {} addrspace(10)* %"'1" to float addrspace(13)* addrspace(11)*
  %10 = addrspacecast {} addrspace(10)* %3 to float addrspace(13)* addrspace(11)*
  %_replacementA19 = phi i64 
  %_replacementA18 = phi {} addrspace(10)* addrspace(11)* 
  %arraysize_ptr47_replacementA = phi {} addrspace(10)* addrspace(11)* 
  %_replacementA17 = phi i64 addrspace(11)* 
  %arraysize_ptr50_replacementA = phi {} addrspace(10)* addrspace(11)* 
  %_replacementA16 = phi i64 addrspace(11)* 
  %"'ipc" = addrspacecast {} addrspace(10)* %"'" to float addrspace(13)* addrspace(11)*
  %_replacementA15 = phi float addrspace(13)* addrspace(11)* 
  %11 = select i1 %.not76, i64 2, i64 3
  %12 = add i64 %7, -2
  %13 = add nsw i64 %5, -2, !dbg !647
  %14 = add nuw nsw i64 %5, 1, !dbg !647
  %smax = call i64 @llvm.smax.i64(i64 %14, i64 3), !dbg !647
  %15 = add nsw i64 %smax, -3, !dbg !647
  %umin = call i64 @llvm.umin.i64(i64 %13, i64 %15), !dbg !647
  %16 = add nuw i64 %umin, 1, !dbg !647
  %17 = add nuw i64 %12, 1, !dbg !647
  %18 = mul nuw nsw i64 %17, %16, !dbg !647
  %19 = mul nuw i64 %18, 8, !dbg !647
  %20 = call noalias nonnull i8* @malloc(i64 %19), !dbg !647, !enzyme_cache_alloc !648
  %loopLimit_malloccache = bitcast i8* %20 to i64*, !dbg !647
  store i64* %loopLimit_malloccache, i64** %loopLimit_cache, align 8, !dbg !647, !invariant.group !650
  store i64 %7, i64* %_cache81, align 8, !dbg !647, !invariant.group !651
  store i64 %unbox_replacementA, i64* %unbox_cache, align 8, !dbg !647, !tbaa !16, !invariant.group !652
  %21 = mul nuw i64 %18, 8, !dbg !647
  %22 = call noalias nonnull i8* @malloc(i64 %21), !dbg !647, !enzyme_cache_alloc !653
  %loopLimit_malloccache3 = bitcast i8* %22 to i64**, !dbg !647
  store i64** %loopLimit_malloccache3, i64*** %loopLimit_cache2, align 8, !dbg !647, !invariant.group !655
  %23 = mul nuw i64 %18, 8, !dbg !647
  %24 = mul nuw i64 %16, 8, !dbg !647
  br label %L36, !dbg !647

L36:                                              ; preds = %L187, %L36.preheader
  %iv = phi i64 [ %iv.next, %L187 ], [ 0, %L36.preheader ]
  %iv.next = add nuw nsw i64 %iv, 1, !dbg !647
  %25 = shl nuw i64 %iv, 1, !dbg !647
  %26 = add i64 %25, 2, !dbg !647
  %27 = add nuw i64 %iv, 2, !dbg !647
  br i1 %8, label %L187, label %L47.lr.ph, !dbg !647

L47.lr.ph:                                        ; preds = %L36
  %28 = shl nuw i64 %27, 1
  %29 = add i64 %28, -2
  %30 = add i64 %28, %9
  %.not79 = icmp sgt i64 %29, %30
  %31 = add i64 %28, -3
  %value_phi16 = select i1 %.not79, i64 %31, i64 %30
  %32 = icmp sgt i64 %29, %value_phi16
  %arraysize.pre_replacementA = phi i64 
  %arraysize32.pre_replacementA = phi i64 
  %"arrayptr.pre80'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc26", align 16, !alias.scope !656, !noalias !659, !invariant.group !661
  %arrayptr.pre80 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %10, align 16, !alias.scope !659, !noalias !656, !invariant.group !662
  %_replacementA25 = phi i64 
  %_replacementA24 = phi i64 
  %arraysize48_replacementA = phi i64 
  %_replacementA23 = phi i64 
  %arraysize51_replacementA = phi i64 
  %"arrayptr5482'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc", align 16, !alias.scope !663, !noalias !666, !invariant.group !668
  %arrayptr5482_replacementA = phi float addrspace(13)* 
  %_replacementA22 = phi i64 
  %reass.add85_replacementA = phi i64 
  %33 = load i64*, i64** %mdyncache_fromtape_cache93, align 8, !dbg !669, !dereferenceable !236, !invariant.group !670
  %34 = getelementptr inbounds i64, i64* %33, i64 %iv, !dbg !669
  %reass.mul86 = load i64, i64* %34, align 8, !dbg !669, !invariant.group !671
  br label %L66, !dbg !669

L66:                                              ; preds = %L178, %L47.lr.ph
  %iv1 = phi i64 [ %iv.next2, %L178 ], [ 0, %L47.lr.ph ]
  %iv.next2 = add nuw nsw i64 %iv1, 1, !dbg !672
  %35 = shl nuw i64 %iv1, 1, !dbg !672
  %36 = add i64 %35, 2, !dbg !672
  %37 = shl nuw i64 %iv1, 1, !dbg !675
  %38 = add nuw i64 %37, 2, !dbg !683
  %39 = add nuw i64 %11, %37, !dbg !683
  %.not78 = icmp sgt i64 %38, %39, !dbg !686
  %40 = or i64 %37, 1, !dbg !688
  %value_phi15 = select i1 %.not78, i64 %40, i64 %39, !dbg !688
  %41 = icmp sgt i64 %38, %value_phi15, !dbg !694
  %not. = or i1 %32, %41, !dbg !697
  br i1 %not., label %L178, label %L130.outer.preheader, !dbg !680

L130.outer.preheader:                             ; preds = %L66
  %42 = mul nuw nsw i64 %17, %16, !dbg !698
  %43 = mul nuw nsw i64 %iv, %17, !dbg !698
  %44 = add nuw nsw i64 %iv1, %43, !dbg !698
  %45 = load i64**, i64*** %loopLimit_cache2, align 8, !dbg !698, !invariant.group !701
  %46 = getelementptr inbounds i64*, i64** %45, i64 %44, !dbg !698
  store i64* null, i64** %46, align 8, !dbg !698
  %47 = mul nuw nsw i64 %17, %16, !dbg !698
  %48 = mul nuw nsw i64 %iv, %17, !dbg !698
  %49 = add nuw nsw i64 %iv1, %48, !dbg !698
  %50 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 3, !dbg !698
  %51 = getelementptr inbounds i64*, i64** %50, i64 %49, !dbg !698
  %52 = mul nuw nsw i64 %17, %16, !dbg !698
  %53 = mul nuw nsw i64 %iv, %17, !dbg !698
  %54 = add nuw nsw i64 %iv1, %53, !dbg !698
  %55 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dbg !698, !invariant.group !702
  %56 = getelementptr inbounds i64*, i64** %55, i64 %54, !dbg !698
  br label %L130.outer, !dbg !698

L130.outer:                                       ; preds = %L148, %L130.outer.preheader
  %iv3 = phi i64 [ 0, %L130.outer.preheader ], [ %iv.next4, %L148 ]
  %value_phi30.ph_replacementA = phi float 
  %iv.next4 = add nuw nsw i64 %iv3, 1
  %57 = load i64*, i64** %51, align 8
  %58 = load i64*, i64** %46, align 8
  %59 = bitcast i64* %58 to i8*
  %loopLimit_realloccache = call i8* @__enzyme_exponentialallocationzero(i8* %59, i64 %iv.next4, i64 8)
  %60 = bitcast i8* %loopLimit_realloccache to i64*
  store i64* %60, i64** %46, align 8
  %61 = add i64 %26, %iv3
  %reass.mul_replacementA = phi i64 
  %62 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dbg !698, !dereferenceable !236, !invariant.group !703
  %63 = mul nuw nsw i64 %17, %16, !dbg !698
  %64 = mul nuw nsw i64 %iv, %17, !dbg !698
  %65 = add nuw nsw i64 %iv1, %64, !dbg !698
  %66 = getelementptr inbounds i64*, i64** %62, i64 %65, !dbg !698
  %67 = load i64*, i64** %66, align 8, !dbg !698, !dereferenceable !236, !invariant.group !704
  %68 = getelementptr inbounds i64, i64* %67, i64 %iv3, !dbg !698
  %69 = load i64, i64* %68, align 8, !dbg !698, !invariant.group !705
  %70 = mul nuw nsw i64 %17, %16, !dbg !698
  %71 = mul nuw nsw i64 %iv, %17, !dbg !698
  %72 = add nuw nsw i64 %iv1, %71, !dbg !698
  br label %L130, !dbg !698

L130:                                             ; preds = %L130, %L130.outer
  %iv5 = phi i64 [ %iv.next6, %L130 ], [ 0, %L130.outer ]
  %value_phi30_replacementA = phi float 
  %iv.next6 = add nuw nsw i64 %iv5, 1, !dbg !706
  %73 = add i64 %36, %iv5, !dbg !706
  %74 = add i64 %69, %73, !dbg !706
  %"'ipg" = getelementptr inbounds float, float addrspace(13)* %"arrayptr.pre80'ipl", i64 %74, !dbg !706
  %75 = getelementptr inbounds float, float addrspace(13)* %arrayptr.pre80, i64 %74, !dbg !706
  %arrayref_replacementA = phi float , !dbg !706
  %_replacementA27_replacementA = phi float , !dbg !709
  %76 = add i64 %73, 1, !dbg !710
  %77 = icmp sgt i64 %38, %76, !dbg !712
  %78 = icmp sgt i64 %76, %value_phi15, !dbg !712
  %79 = or i1 %77, %78, !dbg !698
  %80 = icmp eq i64 %73, %value_phi15
  %or.cond = or i1 %80, %79, !dbg !698
  br i1 %or.cond, label %L148, label %L130, !dbg !698

L148:                                             ; preds = %L130
  %81 = phi i64 [ %iv5, %L130 ], !dbg !715
  %82 = load i64**, i64*** %loopLimit_cache2, align 8, !dbg !715, !dereferenceable !236, !invariant.group !655
  %83 = mul nuw nsw i64 %17, %16, !dbg !715
  %84 = mul nuw nsw i64 %iv, %17, !dbg !715
  %85 = add nuw nsw i64 %iv1, %84, !dbg !715
  %86 = getelementptr inbounds i64*, i64** %82, i64 %85, !dbg !715
  %87 = load i64*, i64** %86, align 8, !dbg !715, !dereferenceable !236, !invariant.group !718
  %88 = getelementptr inbounds i64, i64* %87, i64 %iv3, !dbg !715
  store i64 %81, i64* %88, align 8, !dbg !715, !invariant.group !719
  %89 = add i64 %61, 1, !dbg !715
  %90 = icmp sle i64 %29, %89, !dbg !720
  %91 = icmp sle i64 %89, %value_phi16, !dbg !720
  %92 = and i1 %90, %91, !dbg !724
  %93 = icmp ne i64 %61, %value_phi16, !dbg !723
  %extract.t = and i1 %93, %92, !dbg !725
  br i1 %extract.t, label %L130.outer, label %L178.loopexit, !dbg !700

L178.loopexit:                                    ; preds = %L148
  %94 = phi i64 [ %iv3, %L148 ], !dbg !726
  %95 = load i64*, i64** %loopLimit_cache, align 8, !dbg !726, !dereferenceable !236, !invariant.group !650
  %96 = mul nuw nsw i64 %17, %16, !dbg !726
  %97 = mul nuw nsw i64 %iv, %17, !dbg !726
  %98 = add nuw nsw i64 %iv1, %97, !dbg !726
  %99 = getelementptr inbounds i64, i64* %95, i64 %98, !dbg !726
  store i64 %94, i64* %99, align 8, !dbg !726, !invariant.group !730
  br label %L178, !dbg !726

L178:                                             ; preds = %L178.loopexit, %L66
  %value_phi46_replacementA = phi float 
  %_replacementA67_replacementA = phi float , !dbg !726
  %100 = add i64 %iv.next2, %reass.mul86, !dbg !728
  %"'ipg58" = getelementptr inbounds float, float addrspace(13)* %"arrayptr5482'ipl", i64 %100, !dbg !728
  %_replacementA66 = phi float addrspace(13)* , !dbg !728
  %exitcond.not = icmp eq i64 %iv1, %12, !dbg !731
  br i1 %exitcond.not, label %L187.loopexit, label %L66, !dbg !669, !llvm.loop !732

L187.loopexit:                                    ; preds = %L178
  br label %L187, !dbg !733

L187:                                             ; preds = %L187.loopexit, %L36
  %101 = add nuw i64 %27, 1, !dbg !733
  %102 = icmp slt i64 %101, 2, !dbg !737
  %103 = icmp sgt i64 %101, %5, !dbg !737
  %104 = icmp eq i64 %27, %5, !dbg !740
  %not.not.84 = or i1 %102, %103, !dbg !740
  %narrow83 = or i1 %104, %not.not.84, !dbg !740
  br i1 %narrow83, label %L208.loopexit, label %L36, !dbg !736

L208.loopexit:                                    ; preds = %L187
  br label %L208, !dbg !635

L208:                                             ; preds = %L208.loopexit, %top
  br label %invertL208, !dbg !635

allocsForInversion:                               ; No predecessors!
  %"iv'ac" = alloca i64, align 8
  %"iv1'ac" = alloca i64, align 8
  %"iv3'ac" = alloca i64, align 8
  %loopLimit_cache = alloca i64*, align 8
  %"iv5'ac" = alloca i64, align 8
  %loopLimit_cache2 = alloca i64**, align 8
  %unbox_cache = alloca i64, align 8
  %"value_phi30.ph'de" = alloca float, align 4
  %105 = getelementptr float, float* %"value_phi30.ph'de", i64 0
  store float 0.000000e+00, float* %105, align 4
  %"'de" = alloca float, align 4
  %106 = getelementptr float, float* %"'de", i64 0
  store float 0.000000e+00, float* %106, align 4
  %"arrayref'de" = alloca float, align 4
  %107 = getelementptr float, float* %"arrayref'de", i64 0
  store float 0.000000e+00, float* %107, align 4
  %"value_phi30'de" = alloca float, align 4
  %108 = getelementptr float, float* %"value_phi30'de", i64 0
  store float 0.000000e+00, float* %108, align 4
  %_cache = alloca i64**, align 8
  %reass.mul86_cache = alloca i64*, align 8
  %"'de65" = alloca float, align 4
  %109 = getelementptr float, float* %"'de65", i64 0
  store float 0.000000e+00, float* %109, align 4
  %"value_phi46'de" = alloca float, align 4
  %110 = getelementptr float, float* %"value_phi46'de", i64 0
  store float 0.000000e+00, float* %110, align 4
  %_cache81 = alloca i64, align 8
  %111 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 3
  %mdyncache_fromtape_cache = alloca i64**, align 8
  store i64** %111, i64*** %mdyncache_fromtape_cache, align 8
  %112 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 2
  %mdyncache_fromtape_cache93 = alloca i64*, align 8
  store i64* %112, i64** %mdyncache_fromtape_cache93, align 8

inverttop:                                        ; preds = %invertL208, %invertL36.preheader
  fence syncscope("singlethread") seq_cst
  fence syncscope("singlethread") seq_cst
  ret void

invertL36.preheader:                              ; preds = %invertL36
  %113 = load i64, i64* %"iv'ac", align 8
  %114 = load i64, i64* %"iv1'ac", align 8
  %forfree = load i64*, i64** %loopLimit_cache, align 8, !dereferenceable !236, !invariant.group !650
  %115 = bitcast i64* %forfree to i8*
  call void @free(i8* nonnull %115), !dbg !741, !enzyme_cache_free !648
  %116 = load i64, i64* %"iv'ac", align 8
  %117 = load i64, i64* %"iv1'ac", align 8
  %forfree4 = load i64**, i64*** %loopLimit_cache2, align 8, !dereferenceable !236, !invariant.group !655
  %118 = bitcast i64** %forfree4 to i8*
  call void @free(i8* nonnull %118), !dbg !741, !enzyme_cache_free !653
  %119 = load i64, i64* %"iv'ac", align 8
  %120 = load i64, i64* %"iv1'ac", align 8
  %121 = load i64, i64* %"iv'ac", align 8
  %122 = load i64, i64* %"iv'ac", align 8
  %123 = load i64, i64* %"iv1'ac", align 8
  %forfree87 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dereferenceable !236, !invariant.group !703
  %124 = bitcast i64** %forfree87 to i8*
  call void @free(i8* nonnull %124), !dbg !741
  %125 = load i64, i64* %"iv'ac", align 8
  %forfree94 = load i64*, i64** %mdyncache_fromtape_cache93, align 8, !dereferenceable !236, !invariant.group !670
  %126 = bitcast i64* %forfree94 to i8*
  call void @free(i8* nonnull %126), !dbg !741
  br label %inverttop

invertL36:                                        ; preds = %invertL187, %invertL47.lr.ph
  %127 = load i64, i64* %"iv'ac", align 8
  %128 = icmp eq i64 %127, 0
  %129 = xor i1 %128, true
  br i1 %128, label %invertL36.preheader, label %incinvertL36

incinvertL36:                                     ; preds = %invertL36
  %130 = load i64, i64* %"iv'ac", align 8
  %131 = add nsw i64 %130, -1
  store i64 %131, i64* %"iv'ac", align 8
  br label %invertL187

invertL47.lr.ph:                                  ; preds = %invertL66
  br label %invertL36

invertL66:                                        ; preds = %invertL178, %invertL130.outer.preheader
  %132 = load i64, i64* %"iv1'ac", align 8
  %133 = icmp eq i64 %132, 0
  %134 = xor i1 %133, true
  br i1 %133, label %invertL47.lr.ph, label %incinvertL66

incinvertL66:                                     ; preds = %invertL66
  %135 = load i64, i64* %"iv1'ac", align 8
  %136 = add nsw i64 %135, -1
  store i64 %136, i64* %"iv1'ac", align 8
  br label %invertL178

invertL130.outer.preheader:                       ; preds = %invertL130.outer
  %137 = load i64, i64* %"iv'ac", align 8
  %138 = load i64, i64* %"iv1'ac", align 8
  %139 = load i64, i64* %"iv3'ac", align 8
  %_unwrap = load i64**, i64*** %loopLimit_cache2, align 8, !dbg !698, !invariant.group !701
  %140 = load i64, i64* %unbox_cache, align 8, !dbg !636, !tbaa !16, !alias.scope !64, !noalias !65, !invariant.group !652
  %_unwrap5 = add i64 %140, -1
  %_unwrap97 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1, !dbg !646
  %_unwrap6 = add i64 %_unwrap97, -2
  %_unwrap7 = add nuw i64 %_unwrap6, 1
  %_unwrap8 = mul nuw nsw i64 %137, %_unwrap7
  %_unwrap9 = add nuw nsw i64 %138, %_unwrap8
  %_unwrap10 = getelementptr inbounds i64*, i64** %_unwrap, i64 %_unwrap9
  %forfree11 = load i64*, i64** %_unwrap10, align 8, !dereferenceable !236, !invariant.group !718
  %141 = bitcast i64* %forfree11 to i8*
  call void @free(i8* nonnull %141), !dbg !741
  %142 = load i64, i64* %"iv3'ac", align 8
  %143 = load i64, i64* %"iv'ac", align 8
  %144 = load i64, i64* %"iv1'ac", align 8
  %145 = load i64, i64* %"iv3'ac", align 8
  %_unwrap88 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dbg !698, !invariant.group !702
  %_unwrap89 = mul nuw nsw i64 %143, %_unwrap7
  %_unwrap90 = add nuw nsw i64 %144, %_unwrap89
  %_unwrap91 = getelementptr inbounds i64*, i64** %_unwrap88, i64 %_unwrap90
  %forfree92 = load i64*, i64** %_unwrap91, align 8, !dereferenceable !236, !invariant.group !704
  %146 = bitcast i64* %forfree92 to i8*
  call void @free(i8* nonnull %146), !dbg !741
  br label %invertL66

invertL130.outer:                                 ; preds = %invertL130_amerge
  %147 = load float, float* %"value_phi30.ph'de", align 4
  store float 0.000000e+00, float* %"value_phi30.ph'de", align 4
  %148 = load i64, i64* %"iv3'ac", align 8
  %149 = icmp eq i64 %148, 0
  %150 = xor i1 %149, true
  %151 = select fast i1 %150, float %147, float 0.000000e+00
  %152 = load float, float* %"'de", align 4
  %153 = fadd fast float %152, %147
  %154 = select fast i1 %149, float %152, float %153
  store float %154, float* %"'de", align 4
  br i1 %149, label %invertL130.outer.preheader, label %incinvertL130.outer

incinvertL130.outer:                              ; preds = %invertL130.outer
  %155 = load i64, i64* %"iv3'ac", align 8
  %156 = add nsw i64 %155, -1
  store i64 %156, i64* %"iv3'ac", align 8
  br label %invertL148

invertL130:                                       ; preds = %mergeinvertL130_L148, %incinvertL130
  %157 = load float, float* %"'de", align 4, !dbg !709
  store float 0.000000e+00, float* %"'de", align 4, !dbg !709
  %158 = load float, float* %"arrayref'de", align 4, !dbg !709
  %159 = fadd fast float %158, %157, !dbg !709
  store float %159, float* %"arrayref'de", align 4, !dbg !709
  %160 = load float, float* %"value_phi30'de", align 4, !dbg !709
  %161 = fadd fast float %160, %157, !dbg !709
  store float %161, float* %"value_phi30'de", align 4, !dbg !709
  %162 = load float, float* %"arrayref'de", align 4, !dbg !706
  store float 0.000000e+00, float* %"arrayref'de", align 4, !dbg !706
  %163 = load i64, i64* %"iv5'ac", align 8, !dbg !706
  %164 = load i64, i64* %"iv3'ac", align 8, !dbg !706
  %165 = load i64, i64* %"iv1'ac", align 8, !dbg !706
  %166 = load i64, i64* %"iv'ac", align 8, !dbg !706
  %_unwrap28 = addrspacecast {} addrspace(10)* %3 to float addrspace(13)* addrspace(11)*, !dbg !706
  %arrayptr.pre80_unwrap = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %_unwrap28, align 16, !alias.scope !659, !noalias !656, !invariant.group !662
  %_unwrap101 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0, !dbg !706
  %_unwrap35 = add nsw i64 %_unwrap101, -2, !dbg !706
  %_unwrap36 = add nuw nsw i64 %_unwrap101, 1, !dbg !706
  %167 = call i64 @llvm.smax.i64(i64 %_unwrap36, i64 3), !dbg !647
  %_unwrap37 = add nsw i64 %167, -3, !dbg !706
  %168 = call i64 @llvm.umin.i64(i64 %_unwrap35, i64 %_unwrap37), !dbg !647
  %169 = add nuw i64 %168, 1, !dbg !706
  %_unwrap96 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1, !dbg !706
  %_unwrap39 = add i64 %_unwrap96, -2, !dbg !706
  %170 = add nuw i64 %_unwrap39, 1, !dbg !706
  %171 = mul nuw nsw i64 %170, %169, !dbg !706
  %172 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 3, !dbg !706
  %173 = mul nuw nsw i64 %170, %169, !dbg !706
  %174 = mul nuw nsw i64 %166, %170, !dbg !706
  %175 = add nuw nsw i64 %165, %174, !dbg !706
  %176 = getelementptr inbounds i64*, i64** %172, i64 %175, !dbg !706
  %177 = load i64*, i64** %176, align 8, !dbg !706, !dereferenceable !236, !invariant.group !742
  %178 = getelementptr inbounds i64, i64* %177, i64 %164, !dbg !706
  %179 = load i64, i64* %178, align 8, !dbg !706, !invariant.group !743
  %_unwrap40 = shl nuw i64 %165, 1, !dbg !706
  %_unwrap41 = add i64 %_unwrap40, 2, !dbg !706
  %_unwrap42 = add i64 %_unwrap41, %163, !dbg !706
  %_unwrap43 = add i64 %179, %_unwrap42, !dbg !706
  %_unwrap44 = getelementptr inbounds float, float addrspace(13)* %arrayptr.pre80_unwrap, i64 %_unwrap43, !dbg !706
  %180 = load i64, i64* %"iv5'ac", align 8, !dbg !706
  %181 = load i64, i64* %"iv3'ac", align 8, !dbg !706
  %182 = load i64, i64* %"iv1'ac", align 8, !dbg !706
  %183 = load i64, i64* %"iv'ac", align 8, !dbg !706
  %"'ipc26_unwrap" = addrspacecast {} addrspace(10)* %"'1" to float addrspace(13)* addrspace(11)*, !dbg !706
  %"arrayptr.pre80'ipl_unwrap" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc26_unwrap", align 16, !alias.scope !656, !noalias !659, !invariant.group !661
  %"'ipg_unwrap" = getelementptr inbounds float, float addrspace(13)* %"arrayptr.pre80'ipl_unwrap", i64 %_unwrap43, !dbg !706
  %184 = icmp ne float addrspace(13)* %_unwrap44, %"'ipg_unwrap", !dbg !706
  br i1 %184, label %invertL130_active, label %invertL130_amerge, !dbg !706

invertL130_active:                                ; preds = %invertL130
  %185 = load float, float addrspace(13)* %"'ipg_unwrap", align 4, !dbg !706, !tbaa !134, !alias.scope !744, !noalias !747
  %186 = fadd fast float %185, %162, !dbg !706
  store float %186, float addrspace(13)* %"'ipg_unwrap", align 4, !dbg !706, !tbaa !134, !alias.scope !744, !noalias !747
  br label %invertL130_amerge, !dbg !706

invertL130_amerge:                                ; preds = %invertL130_active, %invertL130
  %187 = load float, float* %"value_phi30'de", align 4
  store float 0.000000e+00, float* %"value_phi30'de", align 4
  %188 = load i64, i64* %"iv5'ac", align 8
  %189 = icmp eq i64 %188, 0
  %190 = xor i1 %189, true
  %191 = select fast i1 %190, float %187, float 0.000000e+00
  %192 = load float, float* %"'de", align 4
  %193 = fadd fast float %192, %187
  %194 = select fast i1 %189, float %192, float %193
  store float %194, float* %"'de", align 4
  %195 = select fast i1 %189, float %187, float 0.000000e+00
  %196 = load float, float* %"value_phi30.ph'de", align 4
  %197 = fadd fast float %196, %187
  %198 = select fast i1 %189, float %197, float %196
  store float %198, float* %"value_phi30.ph'de", align 4
  br i1 %189, label %invertL130.outer, label %incinvertL130

incinvertL130:                                    ; preds = %invertL130_amerge
  %199 = load i64, i64* %"iv5'ac", align 8
  %200 = add nsw i64 %199, -1
  store i64 %200, i64* %"iv5'ac", align 8
  br label %invertL130

invertL148:                                       ; preds = %mergeinvertL130.outer_L178.loopexit, %incinvertL130.outer
  %_unwrap102 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0
  %_unwrap47 = add nsw i64 %_unwrap102, -2
  %_unwrap48 = add nuw nsw i64 %_unwrap102, 1
  %201 = call i64 @llvm.smax.i64(i64 %_unwrap48, i64 3), !dbg !647
  %_unwrap49 = add nsw i64 %201, -3
  %202 = call i64 @llvm.umin.i64(i64 %_unwrap47, i64 %_unwrap49), !dbg !647
  %203 = add nuw i64 %202, 1
  %_unwrap98 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1
  %_unwrap51 = add i64 %_unwrap98, -2
  %204 = add nuw i64 %_unwrap51, 1
  %205 = mul nuw nsw i64 %204, %203
  %206 = load i64**, i64*** %loopLimit_cache2, align 8, !dereferenceable !236, !invariant.group !655
  %207 = load i64, i64* %"iv1'ac", align 8
  %208 = load i64, i64* %"iv'ac", align 8
  %209 = mul nuw nsw i64 %204, %203
  %210 = mul nuw nsw i64 %208, %204
  %211 = add nuw nsw i64 %207, %210
  %212 = getelementptr inbounds i64*, i64** %206, i64 %211
  %213 = load i64*, i64** %212, align 8, !dereferenceable !236, !invariant.group !718
  %214 = load i64, i64* %"iv3'ac", align 8
  %215 = getelementptr inbounds i64, i64* %213, i64 %214
  %216 = load i64, i64* %215, align 8, !invariant.group !719
  br label %mergeinvertL130_L148

mergeinvertL130_L148:                             ; preds = %invertL148
  store i64 %216, i64* %"iv5'ac", align 8
  br label %invertL130

invertL178.loopexit:                              ; preds = %invertL178
  %_unwrap99 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0
  %_unwrap53 = add nsw i64 %_unwrap99, -2
  %_unwrap54 = add nuw nsw i64 %_unwrap99, 1
  %217 = call i64 @llvm.smax.i64(i64 %_unwrap54, i64 3), !dbg !647
  %_unwrap55 = add nsw i64 %217, -3
  %218 = call i64 @llvm.umin.i64(i64 %_unwrap53, i64 %_unwrap55), !dbg !647
  %219 = add nuw i64 %218, 1
  %_unwrap95 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1
  %_unwrap57 = add i64 %_unwrap95, -2
  %220 = add nuw i64 %_unwrap57, 1
  %221 = mul nuw nsw i64 %220, %219
  %222 = load i64*, i64** %loopLimit_cache, align 8, !dereferenceable !236, !invariant.group !650
  %223 = load i64, i64* %"iv1'ac", align 8
  %224 = load i64, i64* %"iv'ac", align 8
  %225 = mul nuw nsw i64 %220, %219
  %226 = mul nuw nsw i64 %224, %220
  %227 = add nuw nsw i64 %223, %226
  %228 = getelementptr inbounds i64, i64* %222, i64 %227
  %229 = load i64, i64* %228, align 8, !invariant.group !730
  br label %mergeinvertL130.outer_L178.loopexit

mergeinvertL130.outer_L178.loopexit:              ; preds = %invertL178.loopexit
  store i64 %229, i64* %"iv3'ac", align 8
  br label %invertL148

invertL178:                                       ; preds = %mergeinvertL66_L187.loopexit, %incinvertL66
  %230 = load i64, i64* %"iv1'ac", align 8, !dbg !728
  %231 = load i64, i64* %"iv'ac", align 8, !dbg !728
  %"'ipc_unwrap" = addrspacecast {} addrspace(10)* %"'" to float addrspace(13)* addrspace(11)*, !dbg !728
  %"arrayptr5482'ipl_unwrap" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc_unwrap", align 16, !alias.scope !663, !noalias !666, !invariant.group !668
  %iv.next2_unwrap = add nuw nsw i64 %230, 1, !dbg !728
  %_unwrap100 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0, !dbg !728
  %_unwrap61 = add nsw i64 %_unwrap100, -2, !dbg !728
  %_unwrap62 = add nuw nsw i64 %_unwrap100, 1, !dbg !728
  %232 = call i64 @llvm.smax.i64(i64 %_unwrap62, i64 3), !dbg !647
  %_unwrap63 = add nsw i64 %232, -3, !dbg !728
  %233 = call i64 @llvm.umin.i64(i64 %_unwrap61, i64 %_unwrap63), !dbg !647
  %234 = add nuw i64 %233, 1, !dbg !728
  %235 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 2, !dbg !728
  %236 = getelementptr inbounds i64, i64* %235, i64 %231, !dbg !728
  %237 = load i64, i64* %236, align 8, !dbg !728, !invariant.group !749
  %_unwrap64 = add i64 %iv.next2_unwrap, %237, !dbg !728
  %"'ipg58_unwrap" = getelementptr inbounds float, float addrspace(13)* %"arrayptr5482'ipl_unwrap", i64 %_unwrap64, !dbg !728
  %238 = load float, float addrspace(13)* %"'ipg58_unwrap", align 4, !dbg !728, !tbaa !134, !alias.scope !750, !noalias !753
  store float 0.000000e+00, float addrspace(13)* %"'ipg58_unwrap", align 4, !dbg !728, !tbaa !134, !alias.scope !750, !noalias !753
  %239 = load float, float* %"'de65", align 4, !dbg !728
  %240 = fadd fast float %239, %238, !dbg !728
  store float %240, float* %"'de65", align 4, !dbg !728
  %241 = load float, float* %"'de65", align 4, !dbg !726
  store float 0.000000e+00, float* %"'de65", align 4, !dbg !726
  %242 = fmul fast float %241, 5.000000e-01, !dbg !726
  %243 = load float, float* %"value_phi46'de", align 4, !dbg !726
  %244 = fadd fast float %243, %242, !dbg !726
  store float %244, float* %"value_phi46'de", align 4, !dbg !726
  %245 = load float, float* %"value_phi46'de", align 4
  store float 0.000000e+00, float* %"value_phi46'de", align 4
  %246 = load i64, i64* %"iv1'ac", align 8
  %247 = load i64, i64* %"iv'ac", align 8
  %_unwrap68 = add nuw i64 %247, 2
  %_unwrap69 = shl nuw i64 %_unwrap68, 1
  %_unwrap70 = add i64 %_unwrap69, -2
  %.not77_unwrap = icmp eq i64 %2, 2
  %_unwrap71 = select i1 %.not77_unwrap, i64 -2, i64 -1
  %_unwrap72 = add i64 %_unwrap69, %_unwrap71
  %.not79_unwrap = icmp sgt i64 %_unwrap70, %_unwrap72
  %_unwrap73 = add i64 %_unwrap69, -3
  %value_phi16_unwrap = select i1 %.not79_unwrap, i64 %_unwrap73, i64 %_unwrap72
  %_unwrap74 = icmp sgt i64 %_unwrap70, %value_phi16_unwrap
  %_unwrap75 = shl nuw i64 %246, 1
  %_unwrap76 = add nuw i64 %_unwrap75, 2
  %.not76_unwrap = icmp eq i64 %2, 1
  %_unwrap77 = select i1 %.not76_unwrap, i64 2, i64 3
  %_unwrap78 = add nuw i64 %_unwrap77, %_unwrap75
  %.not78_unwrap = icmp sgt i64 %_unwrap76, %_unwrap78
  %_unwrap79 = or i64 %_unwrap75, 1
  %value_phi15_unwrap = select i1 %.not78_unwrap, i64 %_unwrap79, i64 %_unwrap78
  %_unwrap80 = icmp sgt i64 %_unwrap76, %value_phi15_unwrap
  %not._unwrap = or i1 %_unwrap74, %_unwrap80
  %248 = xor i1 %not._unwrap, true
  %249 = select fast i1 %248, float %245, float 0.000000e+00
  %250 = load float, float* %"'de", align 4
  %251 = fadd fast float %250, %245
  %252 = select fast i1 %not._unwrap, float %250, float %251
  store float %252, float* %"'de", align 4
  br i1 %not._unwrap, label %invertL66, label %invertL178.loopexit

invertL187.loopexit:                              ; preds = %invertL187
  %253 = load i64, i64* %"iv'ac", align 8
  %254 = load i64, i64* %_cache81, align 8, !invariant.group !651
  %_unwrap82 = add i64 %254, -2
  br label %mergeinvertL66_L187.loopexit

mergeinvertL66_L187.loopexit:                     ; preds = %invertL187.loopexit
  store i64 %_unwrap82, i64* %"iv1'ac", align 8
  br label %invertL178

invertL187:                                       ; preds = %mergeinvertL36_L208.loopexit, %incinvertL36
  %255 = load i64, i64* %"iv'ac", align 8
  %256 = load i64, i64* %_cache81, align 8, !invariant.group !651
  %_unwrap83 = icmp ult i64 %256, 2
  br i1 %_unwrap83, label %invertL36, label %invertL187.loopexit

invertL208.loopexit:                              ; preds = %invertL208
  %_unwrap84 = add nsw i64 %5, -2
  %_unwrap85 = add nuw nsw i64 %5, 1
  %257 = call i64 @llvm.smax.i64(i64 %_unwrap85, i64 3), !dbg !647
  %_unwrap86 = add nsw i64 %257, -3
  %258 = call i64 @llvm.umin.i64(i64 %_unwrap84, i64 %_unwrap86), !dbg !647
  br label %mergeinvertL36_L208.loopexit

mergeinvertL36_L208.loopexit:                     ; preds = %invertL208.loopexit
  store i64 %258, i64* %"iv'ac", align 8
  br label %invertL187

invertL208:                                       ; preds = %L208
  br i1 %6, label %inverttop, label %invertL208.loopexit
}

LLVM.LoadInst(%unbox = load i64, i64 addrspace(11)* %11, align 8, !dbg !28, !tbaa !16, !alias.scope !34, !noalias !37)
LLVM.PHIInst(%unbox_replacementA = phi i64 , !dbg !21)


Stacktrace:
 [1] -
   @ ./int.jl:86
 [2] #127
   @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:29
 [3] map
   @ ./tuple.jl:292
 [4] macro expansion
   @ ./simdloop.jl:69
 [5] ##kern#421#131
   @ ~/git/Enzyme.jl/WaterLily.jl/src/util.jl:103

Stacktrace:
  [1] julia_error(cstr::Cstring, val::Ptr{LLVM.API.LLVMOpaqueValue}, errtype::Enzyme.API.ErrorType, data::Ptr{Nothing}, data2::Ptr{LLVM.API.LLVMOpaqueValue}, B::Ptr{LLVM.API.LLVMOpaqueBuilder})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:2713
  [2] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{Enzyme.API.CDIFFE_TYPE}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, runtimeActivity::Bool, width::Int64, additionalArg::Ptr{LLVM.API.LLVMOpaqueType}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{Bool}, augmented::Ptr{Nothing}, atomicAdd::Bool)
    @ Enzyme.API ~/git/Enzyme.jl/src/api.jl:253
  [3] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::NTuple{5, Bool}, returnPrimal::Bool, expectedTapeType::Type, loweredArgs::Set{Int64}, boxedArgs::Set{Int64})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:5058
  [4] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8191
  [5] codegen
    @ ~/git/Enzyme.jl/src/compiler.jl:7028 [inlined]
  [6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:9299
  [7] _thunk
    @ ~/git/Enzyme.jl/src/compiler.jl:9299 [inlined]
  [8] cached_compilation
    @ ~/git/Enzyme.jl/src/compiler.jl:9340 [inlined]
  [9] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{0x0000000000007b3e}, ::Type{Const{typeof(Core.kwcall)}}, ::Type{Const{Nothing}}, tt::Type{Tuple{Const{@NamedTuple{perdir::Tuple{}}}, Const{typeof(WaterLily.restrictL!)}, Duplicated{Array{Float32, 3}}, Duplicated{Array{Float32, 3}}}}, ::Val{Enzyme.API.DEM_ReverseModePrimal}, ::Val{1}, ::Val{(true, true, true, true, true)}, ::Val{true}, ::Val{false}, ::Type{FFIABI}, ::Val{false}, ::Val{true})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:9472
 [10] #s2067#19669
    @ ~/git/Enzyme.jl/src/compiler.jl:9609 [inlined]
 [11] var"#s2067#19669"(FA::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, ReturnPrimal::Any, ShadowInit::Any, World::Any, ABI::Any, ErrIfFuncWritten::Any, RuntimeActivity::Any, ::Any, ::Type, ::Type, ::Type, tt::Any, ::Type, ::Type, ::Type, ::Type, ::Type, ::Type, ::Type, ::Any)
    @ Enzyme.Compiler ./none:0
 [12] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core ./boot.jl:602
 [13] runtime_generic_augfwd(activity::Type{Val{(false, false, false, true, true)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true, true, true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::typeof(Core.kwcall), df::Nothing, primal_1::@NamedTuple{perdir::Tuple{}}, shadow_1_1::@NamedTuple{perdir::Tuple{}}, primal_2::typeof(WaterLily.restrictL!), shadow_2_1::Nothing, primal_3::Array{Float32, 3}, shadow_3_1::Array{Float32, 3}, primal_4::Array{Float32, 3}, shadow_4_1::Array{Float32, 3})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:468
 [14] restrictML
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:23 [inlined]
 [15] restrictML
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0 [inlined]
 [16] augmented_julia_restrictML_9751_inner_1wrap
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0
 [17] macro expansion
    @ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
 [18] enzyme_call
    @ ~/git/Enzyme.jl/src/compiler.jl:8795 [inlined]
 [19] AugmentedForwardThunk
    @ ~/git/Enzyme.jl/src/compiler.jl:8632 [inlined]
 [20] runtime_generic_augfwd(activity::Type{Val{(false, true)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::typeof(WaterLily.restrictML), df::Nothing, primal_1::Poisson{Float32, Matrix{Float32}, Array{Float32, 3}}, shadow_1_1::Poisson{Float32, Matrix{Float32}, Array{Float32, 3}})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:483
 [21] _
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:54
 [22] MultiLevelPoisson
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:51 [inlined]
 [23] MultiLevelPoisson
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0 [inlined]
 [24] augmented_julia_MultiLevelPoisson_9175_inner_1wrap
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0
 [25] macro expansion
    @ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
 [26] enzyme_call(::Val{false}, ::Ptr{Nothing}, ::Type{Enzyme.Compiler.AugmentedForwardThunk}, ::Val{1}, ::Val{true}, ::Type{Tuple{Const{@NamedTuple{perdir::Tuple{}}}, Const{Type{MultiLevelPoisson}}, Duplicated{Matrix{Float32}}, Duplicated{Array{Float32, 3}}, Duplicated{Matrix{Float32}}}}, ::Type{Duplicated{MultiLevelPoisson{Float32, Matrix{Float32}, Array{Float32, 3}}}}, ::Const{typeof(Core.kwcall)}, ::Type{@NamedTuple{1::@NamedTuple{1, 2, 3::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 4, 5, 6::@NamedTuple{1, 2, 3, 4, 5::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{UInt64, 0}}, 6, 7, 8, 9, 10, 11::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{Float32, 0}}, 12, 13, 14, 15::Bool, 16::Bool}, 7, 8, 9, 10, 11, 12::UInt64, 13::UInt64, 14::UInt64}, 4, 5::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 6, 7, 8::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 3, 4::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 5, 6, 7, 8, 9::Bool, 10, 11}, 4::UInt64, 5::UInt64}, 9, 10, 11, 12, 13, 14, 15, 16::Bool, 17::Core.LLVMPtr{UInt64, 0}, 18, 19::Core.LLVMPtr{Bool, 0}, 20::Core.LLVMPtr{UInt64, 0}, 21, 22::Core.LLVMPtr{Bool, 0}}, 2}}, ::Const{@NamedTuple{perdir::Tuple{}}}, ::Const{Type{MultiLevelPoisson}}, ::Duplicated{Matrix{Float32}}, ::Duplicated{Array{Float32, 3}}, ::Duplicated{Matrix{Float32}})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8795
 [27] (::Enzyme.Compiler.AugmentedForwardThunk{Ptr{Nothing}, Const{typeof(Core.kwcall)}, Duplicated{MultiLevelPoisson{Float32, Matrix{Float32}, Array{Float32, 3}}}, Tuple{Const{@NamedTuple{perdir::Tuple{}}}, Const{Type{MultiLevelPoisson}}, Duplicated{Matrix{Float32}}, Duplicated{Array{Float32, 3}}, Duplicated{Matrix{Float32}}}, 1, true, @NamedTuple{1::@NamedTuple{1, 2, 3::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 4, 5, 6::@NamedTuple{1, 2, 3, 4, 5::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{UInt64, 0}}, 6, 7, 8, 9, 10, 11::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{Float32, 0}}, 12, 13, 14, 15::Bool, 16::Bool}, 7, 8, 9, 10, 11, 12::UInt64, 13::UInt64, 14::UInt64}, 4, 5::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 6, 7, 8::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 3, 4::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 5, 6, 7, 8, 9::Bool, 10, 11}, 4::UInt64, 5::UInt64}, 9, 10, 11, 12, 13, 14, 15, 16::Bool, 17::Core.LLVMPtr{UInt64, 0}, 18, 19::Core.LLVMPtr{Bool, 0}, 20::Core.LLVMPtr{UInt64, 0}, 21, 22::Core.LLVMPtr{Bool, 0}}, 2}})(::Const{typeof(Core.kwcall)}, ::Const{@NamedTuple{perdir::Tuple{}}}, ::Vararg{Any})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8632
 [28] runtime_generic_augfwd(activity::Type{Val{(false, false, false, true, true, true)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true, true, true, true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::typeof(Core.kwcall), df::Nothing, primal_1::@NamedTuple{perdir::Tuple{}}, shadow_1_1::Nothing, primal_2::Type{MultiLevelPoisson}, shadow_2_1::Nothing, primal_3::Matrix{Float32}, shadow_3_1::Matrix{Float32}, primal_4::Array{Float32, 3}, shadow_4_1::Array{Float32, 3}, primal_5::Matrix{Float32}, shadow_5_1::Matrix{Float32})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:483
 [29] _
    @ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:76 [inlined]
 [30] _
    @ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:0 [inlined]
 [31] augmented_julia___270_5848_inner_1wrap
    @ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:0
 [32] macro expansion
    @ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
 [33] enzyme_call(::Val{false}, ::Ptr{Nothing}, ::Type{Enzyme.Compiler.AugmentedForwardThunk}, ::Val{1}, ::Val{true}, ::Type{Tuple{Const{Float64}, Active{Float64}, Const{Nothing}, Const{Nothing}, Const{Int64}, Const{Tuple{}}, Const{Nothing}, Const{Bool}, Active{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, Const{Type{Float32}}, Const{Type{Array}}, Const{Type{Simulation}}, Const{Tuple{Int64, Int64}}, Const{Tuple{Int64, Int64}}, Const{Int64}}}, ::Type{Duplicated{Simulation}}, ::Const{WaterLily.var"#_#270#274"}, ::Type{@NamedTuple{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38::Core.LLVMPtr{UInt8, 0}, 39::Core.LLVMPtr{UInt8, 0}, 40::Core.LLVMPtr{UInt8, 0}}}, ::Const{Float64}, ::Active{Float64}, ::Const{Nothing}, ::Const{Nothing}, ::Const{Int64}, ::Const{Tuple{}}, ::Const{Nothing}, ::Const{Bool}, ::Active{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, ::Const{Type{Float32}}, ::Const{Type{Array}}, ::Const{Type{Simulation}}, ::Const{Tuple{Int64, Int64}}, ::Const{Tuple{Int64, Int64}}, ::Const{Int64})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8795
 [34] (::Enzyme.Compiler.AugmentedForwardThunk{Ptr{Nothing}, Const{WaterLily.var"#_#270#274"}, Duplicated{Simulation}, Tuple{Const{Float64}, Active{Float64}, Const{Nothing}, Const{Nothing}, Const{Int64}, Const{Tuple{}}, Const{Nothing}, Const{Bool}, Active{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, Const{Type{Float32}}, Const{Type{Array}}, Const{Type{Simulation}}, Const{Tuple{Int64, Int64}}, Const{Tuple{Int64, Int64}}, Const{Int64}}, 1, true, @NamedTuple{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38::Core.LLVMPtr{UInt8, 0}, 39::Core.LLVMPtr{UInt8, 0}, 40::Core.LLVMPtr{UInt8, 0}}})(::Const{WaterLily.var"#_#270#274"}, ::Const{Float64}, ::Vararg{Any})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8632
 [35] runtime_generic_augfwd(activity::Type{Val{(false, false, true, false, false, false, false, false, false, true, false, false, false, false, false, false)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::WaterLily.var"#_#270#274", df::Nothing, primal_1::Float64, shadow_1_1::Nothing, primal_2::Float64, shadow_2_1::Base.RefValue{Float64}, primal_3::Nothing, shadow_3_1::Nothing, primal_4::Nothing, shadow_4_1::Nothing, primal_5::Int64, shadow_5_1::Nothing, primal_6::Tuple{}, shadow_6_1::Nothing, primal_7::Nothing, shadow_7_1::Nothing, primal_8::Bool, shadow_8_1::Nothing, primal_9::AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, shadow_9_1::Base.RefValue{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, primal_10::Type{Float32}, shadow_10_1::Nothing, primal_11::Type{Array}, shadow_11_1::Nothing, primal_12::Type{Simulation}, shadow_12_1::Nothing, primal_13::Tuple{Int64, Int64}, shadow_13_1::Nothing, primal_14::Tuple{Int64, Int64}, shadow_14_1::Nothing, primal_15::Int64, shadow_15_1::Nothing)
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:483
 [36] Simulation
    @ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:65 [inlined]
 [37] #make_foils#1
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:24
 [38] make_foils
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:3 [inlined]
 [39] mean_drag (repeats 5 times)
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:39
 [40] f
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:49 [inlined]
 [41] augmented_julia_f_2861wrap
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:0
 [42] macro expansion
    @ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
 [43] enzyme_call
    @ ~/git/Enzyme.jl/src/compiler.jl:8795 [inlined]
 [44] AugmentedForwardThunk
    @ ~/git/Enzyme.jl/src/compiler.jl:8632 [inlined]
 [45] autodiff
    @ ~/git/Enzyme.jl/src/Enzyme.jl:384 [inlined]
 [46] autodiff
    @ ~/git/Enzyme.jl/src/Enzyme.jl:512 [inlined]
 [47] g!
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:59 [inlined]
 [48] macro expansion
    @ ./timing.jl:279 [inlined]
 [49] top-level scope
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:269

wsmoses avatar Sep 28 '24 21:09 wsmoses

https://fwd.gymni.ch/AkG87D

wsmoses avatar Sep 28 '24 21:09 wsmoses

https://fwd.gymni.ch/Nw55G8

wsmoses avatar Sep 28 '24 22:09 wsmoses

https://fwd.gymni.ch/xFoqVm

wsmoses avatar Sep 28 '24 22:09 wsmoses

Now with https://github.com/EnzymeAD/Enzyme/pull/2089 it now hits https://github.com/EnzymeAD/Enzyme.jl/issues/1781

wsmoses avatar Sep 29 '24 01:09 wsmoses

Hey, thanks for working on this! I have followed the thread of fixes and I understand you managed to fix them all? Does KA need to be updated before I can try to run our example in WaterLily using Enzyme in GPU simulations?

b-fg avatar Sep 30 '24 08:09 b-fg

You may need https://github.com/JuliaGPU/KernelAbstractions.jl/pull/534 as well. However locally running while the above issue is fixed there was still a strange memory issue going awry with a full model. @b-fg if you're able to reduce any remaining issues to a MWE, we can try to get them fixed

wsmoses avatar Oct 01 '24 14:10 wsmoses