Enzyme.jl
                                
                                
                                
                                    Enzyme.jl copied to clipboard
                            
                            
                            
                        Problems with callable `struct`
I have encountered some problems with callable structs, with fields being modified or wrong gradients being returned:
using AbstractGPs, TemporalGPs, Enzyme, Zygote
struct Loss
    x::Vector{Float64}
    y::Vector{Float64}
end
function (l::Loss)(θ)
    f = to_sde(GP(θ.v * Matern52Kernel() ∘ ScaleTransform(θ.l)), SArrayStorage(Float64))
    return logpdf(f(l.x, θ.σ + 1e-6), l.y)
end
θ = (v = 1., l = 1., σ = 0.1)
# First example: even though `loss` is marked `Const` (IIUC this is optional), its fields get modified:
loss = Loss(1:10, randn(10))
ref = first(loss.y)
autodiff(Reverse, Const(loss), Active(θ))
ref == first(loss.y) # false
# Second example: we introduce a shadow `struct`
# This gets rid of the problem in example 1, but the gradient is wrong
loss = Loss(1:10, randn(10))
dloss = Loss(zeros(10), zeros(10))
ref = first(loss.y)
grad_e = only(autodiff(Reverse, Duplicated(loss, dloss), Active(θ))) # needs another `|> only` for [email protected]
grad_z = only(Zygote.gradient(loss, θ))
ref == first(loss.y) # true
mapreduce(≈, &, grad_e, grad_z) # false
                                    
                                    
                                    
                                
So if I understand it right there are two separate issues.
The first one is that with Const(loss) the reverse pass mutates the input.
And then the second issue is that even if you duplicate loss there is a wrong gradient? Would be great to undo some of the layers on abstraction here and remove dependencies on packages for the MWE.
That's the hard part. I don't know where to start. Is there a way to find out where the mutation happens?
Not easily.
In order for us to be able to debug this, we'll need you to try to inline all the methods/etc used by the package -- so ideally there are few or no external packages needed to reproduce this (but perhaps a bigger test.jl file)
The second issue (gradients wrong with duplicated struct) is now solved in v0.11.0. The first issue (struct fields being mutated) still persists. Is this considered a bug?
Yes, please reduce to a MWE still.
This is as far as I was able to minimize it:
using Enzyme
struct Foo
    x::Vector{Float64}
    y::Vector{Float64}
end
struct Bar{T1, T2}
    x::T1
    y::T2
end
Base.getindex(m::Bar, i::Int) = (m.x[i], m.y[i])
function (l::Foo)(θ)
    y = l.y
    Σys = y * y'
    bar1 = Bar(y, Σys)
    bar2 = Bar(bar1, y)
    itr = eachindex(y)
    return sum(baz(bar2, itr))
end
function baz(bar::Bar, idx)
    ys = zeros(length(idx))
    ys[idx[1]] = 0.
    _, _y = bar[idx[1]]
    ys[idx[1]] = _y
    return ys
end
θ = 1.
foo = Foo(1:10, randn(10))
ref = copy(foo.y)
autodiff(Reverse, foo, Active(θ))
ref == foo.y # false, should be true
                                    
                                    
                                    
                                
@wsmoses, is this MWE minimal enough?
using Enzyme
Enzyme.API.printall!(true)
Enzyme.API.printactivity!(true)
@noinline function bcast(y)
	y[1] *= 2
    # return copy(y)
    return y * y' #copy(y)
end
function func(y, θ, idx)
    # bar1 = Bar(y, bcast(y))
    return @inbounds baz(bcast(y))
end
@noinline function baz(bary)
	r = bary[1] # (barx[1], bary[1])
	bary[1] = 0
	return r[1]
end
θ = 1.
y = randn(10)
ref = copy(y)
autodiff(Reverse, func, Const(y), Active(θ), Const(1))
@show ref, y
                                    
                                    
                                    
                                
using Enzyme
Enzyme.API.printall!(true)
Enzyme.API.printactivity!(true)
@noinline function bcast(y, cond)
    return cond ? copy(y) : y
end
function func(y, cond)
	res = bcast(y, cond)
	# cp = copy(y)
    # res = cond ? cp : y
    return @inbounds baz(res)
end
@noinline function baz(bary)
	r = @inbounds bary[1] # (barx[1], bary[1])
	@inbounds bary[1] = 0
	return r[1]
end
θ = 1.
y = randn(10)
ref = copy(y)
autodiff(Reverse, func, Const(y), Const(true))
@show ref, y
                                    
                                    
                                    
                                
This requires a nontrivial improvement to activity analysis in order to solve. Baz is assumed to be active by default (esp since its return value is active).
While all args of bcast are inactive, it could allocate new (active) memory, like malloc. When it checks if its users are all inactive so it can become const it sadly finds baz active.
The issue here is that the return is active only because it contains new memory, not because it contains active data. In that sense it is like malloc/etc. It may be beneficial to add a new notion of active mem without active data.
I just checked this example again and it now fails with ERROR: Enzyme execution failed.
Can you post the full error (that part alone isn't really helpful to figuring out what's happening =/)
Yeah I was aware of that. The full error is very long so I figured you'd want to print it yourself. I can post it tomorrow though.
Yeah posting the full error is always desirable. Sometimes one can provide an immediate fix off it alone.
It's also helpful if it cannot be reproduced elsewhere (or has a bunch of dependencies).
However and more importantly imo, it makes a historical note of what the error was at that time. As the codebase changes and potentially fixes the issue (or introduces new ones), one can see what issue was being hit here at this time.
Here is the full thing:
julia> autodiff(Reverse, func, Const(y), Const(true))
after simplification :
; Function Attrs: mustprogress willreturn
define double @preprocess_julia_func_780_inner.1({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) local_unnamed_addr #5 !dbg !50 {
entry:
  %2 = call {}*** @julia.get_pgcstack() #6
  %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !51
  %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !53
  ret double %4, !dbg !54
}
in new function diffejulia_func_780_inner.1 constant arg {} addrspace(10)* %0
in new function diffejulia_func_780_inner.1 constant arg i8 %1
forced inactive   %2 = call {}*** @julia.get_pgcstack() #6
forced inactive val   %2 = call {}*** @julia.get_pgcstack() #6
  %2 = call {}*** @julia.get_pgcstack() #6 cv=1 ci=1
checking if is constant[3]   %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
 < UPSEARCH1>  %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
constant(1)  up-call:  %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
 constant instruction from origin instruction   %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
 VALUE potentially used as pointer   %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11 by   %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13
 < MEMSEARCH3>  %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
 < UPSEARCH1>  %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
constant(1)  up-call:  %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
 constant instruction hypothesis:   %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
potential active load:   %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13
checking if is constant[3]   %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13
 < UPSEARCH1>  %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13
nonconstant(1)  up-call   %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13 op   %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
couldnt decide fallback as nonconstant instruction(3):  %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13
potential active store via pointer in unknown inst:   %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13 of   %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
potential active store:   %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13 Val=  %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
 -- unknown store potential activity: 1 -   %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13 of  Val=  %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
 </MEMSEARCH3>  %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11 potentiallyActiveLoad=0x40cd3b0 potentiallyActiveStore=0x40cd3b0 potentialStore=1
  %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11 cv=0 ci=1
 < UPSEARCH1>  %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13
nonconstant(1)  up-call   %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13 op   %3 = call fastcc nonnull {} addrspace(10)* @julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) #7, !dbg !11
 <Value USESEARCH2>  %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13 UA=None
      considering use of   %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13 -   ret double %4, !dbg !14
 Value nonconstant (couldn't disprove)[3]  %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13
  %4 = call fastcc double @julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %3) #7, !dbg !13 cv=0 ci=0
  ret double %4, !dbg !14 cv=1 ci=1
after simplification :
; Function Attrs: mustprogress nofree noinline nosync willreturn
define internal fastcc double @preprocess_julia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %0) unnamed_addr #6 !dbg !60 {
top:
  %1 = call {}*** @julia.get_pgcstack() #7
  %2 = bitcast {} addrspace(10)* %0 to double addrspace(13)* addrspace(10)*, !dbg !61
  %3 = addrspacecast double addrspace(13)* addrspace(10)* %2 to double addrspace(13)* addrspace(11)*, !dbg !61
  %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !61, !tbaa !20, !alias.scope !63, !noalias !30, !nonnull !10
  %5 = load double, double addrspace(13)* %4, align 8, !dbg !61, !tbaa !35, !alias.scope !38, !noalias !39
  store double 0.000000e+00, double addrspace(13)* %4, align 8, !dbg !66, !tbaa !35, !alias.scope !38, !noalias !68
  ret double %5, !dbg !69
}
in new function diffejulia_baz_784 nonconstant arg {} addrspace(10)* %0
forced inactive   %1 = call {}*** @julia.get_pgcstack() #7
forced inactive val   %1 = call {}*** @julia.get_pgcstack() #7
  %1 = call {}*** @julia.get_pgcstack() #7 cv=1 ci=1
checking if is constant[3]   %2 = bitcast {} addrspace(10)* %0 to double addrspace(13)* addrspace(10)*, !dbg !11
 constant instruction from known non-float non-writing instruction   %2 = bitcast {} addrspace(10)* %0 to double addrspace(13)* addrspace(10)*, !dbg !11
 VALUE potentially used as pointer   %2 = bitcast {} addrspace(10)* %0 to double addrspace(13)* addrspace(10)*, !dbg !11 by   %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
 arg active from orig val=  %2 = bitcast {} addrspace(10)* %0 to double addrspace(13)* addrspace(10)*, !dbg !11 orig={} addrspace(10)* %0
  %2 = bitcast {} addrspace(10)* %0 to double addrspace(13)* addrspace(10)*, !dbg !11 cv=0 ci=1
checking if is constant[3]   %3 = addrspacecast double addrspace(13)* addrspace(10)* %2 to double addrspace(13)* addrspace(11)*, !dbg !11
 constant instruction from known non-float non-writing instruction   %3 = addrspacecast double addrspace(13)* addrspace(10)* %2 to double addrspace(13)* addrspace(11)*, !dbg !11
 VALUE potentially used as pointer   %3 = addrspacecast double addrspace(13)* addrspace(10)* %2 to double addrspace(13)* addrspace(11)*, !dbg !11 by   %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
 arg active from orig val=  %3 = addrspacecast double addrspace(13)* addrspace(10)* %2 to double addrspace(13)* addrspace(11)*, !dbg !11 orig={} addrspace(10)* %0
  %3 = addrspacecast double addrspace(13)* addrspace(10)* %2 to double addrspace(13)* addrspace(11)*, !dbg !11 cv=0 ci=1
checking if is constant[3]   %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
 constant instruction from known non-float non-writing instruction   %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
 VALUE potentially used as pointer   %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10 by   store double 0.000000e+00, double addrspace(13)* %4, align 8, !dbg !35, !tbaa !30, !alias.scope !33, !noalias !39
 < MEMSEARCH3>  %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
 < UPSEARCH1>  %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
nonconstant(1)  up-inst   %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10 op   %3 = addrspacecast double addrspace(13)* addrspace(10)* %2 to double addrspace(13)* addrspace(11)*, !dbg !11
 cannot show constant instruction hypothesis:   %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
potential active load:   %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
checking if is constant[3]   store double 0.000000e+00, double addrspace(13)* %4, align 8, !dbg !35, !tbaa !30, !alias.scope !33, !noalias !39
 < UPSEARCH1>  store double 0.000000e+00, double addrspace(13)* %4, align 8, !dbg !35, !tbaa !30, !alias.scope !33, !noalias !39
 constant instruction as store operand is inactive   store double 0.000000e+00, double addrspace(13)* %4, align 8, !dbg !35, !tbaa !30, !alias.scope !33, !noalias !39
 constant instruction from origin instruction   store double 0.000000e+00, double addrspace(13)* %4, align 8, !dbg !35, !tbaa !30, !alias.scope !33, !noalias !39
 < UPSEARCH1>  %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34
nonconstant(1)  up-inst   %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34 op   %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
 <Value USESEARCH2>  %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34 UA=None
      considering use of   %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34 -   ret double %5, !dbg !40
 Value nonconstant (couldn't disprove)[3]  %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34
potential active load:   %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34
potential active store:   store double 0.000000e+00, double addrspace(13)* %4, align 8, !dbg !35, !tbaa !30, !alias.scope !33, !noalias !39 Val=  %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
 -- store potential activity: 0 -   store double 0.000000e+00, double addrspace(13)* %4, align 8, !dbg !35, !tbaa !30, !alias.scope !33, !noalias !39 of  Val=  %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
 </MEMSEARCH3>  %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10 potentiallyActiveLoad=0x424f2f0 potentiallyActiveStore=0x0 potentialStore=1
 < UPSEARCH1>  %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
nonconstant(1)  up-inst   %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10 op   %3 = addrspacecast double addrspace(13)* addrspace(10)* %2 to double addrspace(13)* addrspace(11)*, !dbg !11
 <ASOR2 ignoreStoresinto=1>  %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
 </ASOR2 ignoreStoresInto=1 inactive>  %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
 @@MEMSEARCH3>  %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10 potentiallyActiveLoad=0x424f2f0 potentialStore=1 ActiveUp=1 ActiveDown=0 ActiveMemory=1
  %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10 cv=0 ci=1
checking if is constant[3]   %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34
 <Value USESEARCH2>  %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34 UA=None
      considering use of   %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34 -   ret double %5, !dbg !40
 < UPSEARCH1>  %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34
nonconstant(1)  up-inst   %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34 op   %4 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %3, align 8, !dbg !11, !tbaa !15, !alias.scope !20, !noalias !25, !nonnull !10
couldnt decide fallback as nonconstant instruction(3):  %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34
  %5 = load double, double addrspace(13)* %4, align 8, !dbg !11, !tbaa !30, !alias.scope !33, !noalias !34 cv=0 ci=0
  store double 0.000000e+00, double addrspace(13)* %4, align 8, !dbg !35, !tbaa !30, !alias.scope !33, !noalias !39 cv=1 ci=1
  ret double %5, !dbg !40 cv=1 ci=1
; Function Attrs: mustprogress nofree noinline nosync willreturn
define internal fastcc void @diffejulia_baz_784({} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) %0, {} addrspace(10)* nocapture nofree align 16 %"'", double %differeturn) unnamed_addr #6 !dbg !70 {
top:
  %"'de" = alloca double, align 8
  %1 = getelementptr double, double* %"'de", i64 0
  store double 0.000000e+00, double* %1, align 8
  %2 = call {}*** @julia.get_pgcstack() #7
  %"'ipc" = bitcast {} addrspace(10)* %"'" to double addrspace(13)* addrspace(10)*, !dbg !71
  %3 = bitcast {} addrspace(10)* %0 to double addrspace(13)* addrspace(10)*, !dbg !71
  %"'ipc1" = addrspacecast double addrspace(13)* addrspace(10)* %"'ipc" to double addrspace(13)* addrspace(11)*, !dbg !71
  %4 = addrspacecast double addrspace(13)* addrspace(10)* %3 to double addrspace(13)* addrspace(11)*, !dbg !71
  %"'ipl" = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %"'ipc1", align 8, !dbg !71, !tbaa !20, !alias.scope !73, !noalias !76, !nonnull !10
  %5 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %4, align 8, !dbg !71, !tbaa !20, !alias.scope !78, !noalias !79, !nonnull !10
  store double 0.000000e+00, double addrspace(13)* %5, align 8, !dbg !80, !tbaa !35, !alias.scope !82, !noalias !85
  br label %inverttop, !dbg !87
inverttop:                                        ; preds = %top
  store double %differeturn, double* %"'de", align 8
  store double 0.000000e+00, double addrspace(13)* %"'ipl", align 8, !dbg !80, !tbaa !35, !alias.scope !88, !noalias !89
  %6 = load double, double* %"'de", align 8, !dbg !71
  store double 0.000000e+00, double* %"'de", align 8, !dbg !71
  %7 = load double, double addrspace(13)* %"'ipl", align 8, !dbg !71, !tbaa !35, !alias.scope !88, !noalias !90
  %8 = fadd fast double %7, %6, !dbg !71
  store double %8, double addrspace(13)* %"'ipl", align 8, !dbg !71, !tbaa !35, !alias.scope !88, !noalias !90
  ret void
}
after simplification :
; Function Attrs: inaccessiblememonly mustprogress noinline willreturn
define internal fastcc nonnull {} addrspace(10)* @preprocess_julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) unnamed_addr #7 !dbg !91 {
top:
  %2 = call {}*** @julia.get_pgcstack() #8
  %3 = and i8 %1, 1, !dbg !92
  %.not = icmp eq i8 %3, 0, !dbg !92
  br i1 %.not, label %common.ret, label %L2, !dbg !92
common.ret:                                       ; preds = %L2, %top
  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
  ret {} addrspace(10)* %common.ret.op, !dbg !92
L2:                                               ; preds = %top
  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !93
  br label %common.ret
}
in new function fakeaugmented_julia_bcast_786 constant arg {} addrspace(10)* %0
in new function fakeaugmented_julia_bcast_786 constant arg i8 %1
forced inactive   %2 = call {}*** @julia.get_pgcstack() #8
forced inactive val   %2 = call {}*** @julia.get_pgcstack() #8
  %2 = call {}*** @julia.get_pgcstack() #8 cv=1 ci=1
checking if is constant[3]   %3 = and i8 %1, 1, !dbg !11
 constant instruction from known non-float non-writing instruction   %3 = and i8 %1, 1, !dbg !11
 Value const as integral 3   %3 = and i8 %1, 1, !dbg !11 Integer
  %3 = and i8 %1, 1, !dbg !11 cv=1 ci=1
checking if is constant[3]   %.not = icmp eq i8 %3, 0, !dbg !11
 constant instruction from known non-float non-writing instruction   %.not = icmp eq i8 %3, 0, !dbg !11
 Value const as integral 3   %.not = icmp eq i8 %3, 0, !dbg !11 Integer
  %.not = icmp eq i8 %3, 0, !dbg !11 cv=1 ci=1
  br i1 %.not, label %common.ret, label %L2, !dbg !11 cv=1 ci=1
checking if is constant[3]   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 constant instruction from known non-float non-writing instruction   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 <Potential Pointer assumed active at 1>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 < UPSEARCH1>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
nonconstant(1)  up-inst   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] op   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 < MEMSEARCH3>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 < UPSEARCH1>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
nonconstant(1)  up-inst   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] op   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 cannot show constant instruction hypothesis:   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 </MEMSEARCH3>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] potentiallyActiveLoad=0x0 potentiallyActiveStore=0x0 potentialStore=0
 < UPSEARCH1>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
nonconstant(1)  up-inst   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] op   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 <ASOR2 ignoreStoresinto=1>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 </ASOR2 ignoreStoresInto=1> active from-ret>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 @@MEMSEARCH3>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] potentiallyActiveLoad=0x0 potentialStore=0 ActiveUp=1 ActiveDown=1 ActiveMemory=1
  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] cv=0 ci=1
  ret {} addrspace(10)* %common.ret.op, !dbg !11 cv=1 ci=1
checking if is constant[3]   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 < UPSEARCH1>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
constant(1)  up-call:  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 constant instruction from origin instruction   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 <Value USESEARCH2>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 UA=OnlyLoads
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 -   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 -   ret {} addrspace(10)* %common.ret.op, !dbg !11
 <Value USESEARCH2>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 UA=OnlyNonPointerStores
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 -   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 -   ret {} addrspace(10)* %common.ret.op, !dbg !11
 <Value USESEARCH2>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 UA=AllStores
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 -   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 -   ret {} addrspace(10)* %common.ret.op, !dbg !11
 <Value USESEARCH2>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 UA=None
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 -   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 -   ret {} addrspace(10)* %common.ret.op, !dbg !11
 < MEMSEARCH3>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 < UPSEARCH1>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
constant(1)  up-call:  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 constant instruction hypothesis:   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 </MEMSEARCH3>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 potentiallyActiveLoad=0x0 potentiallyActiveStore=0x0 potentialStore=0
 < UPSEARCH1>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
constant(1)  up-call:  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 <ASOR2 ignoreStoresinto=1>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12
 <ASOR2 ignoreStoresinto=1>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 </ASOR2 ignoreStoresInto=1> active from-ret>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 </ASOR2 ignoreStoresInto=1 active from-unknown>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 - use=  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 @@MEMSEARCH3>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 potentiallyActiveLoad=0x0 potentialStore=0 ActiveUp=0 ActiveDown=1 ActiveMemory=1
  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #9, !dbg !12 cv=0 ci=1
  br label %common.ret cv=1 ci=1
┌ Warning: TODO forward zero-set of arraycopy used memset rather than runtime type
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/cy24l/src/utils.jl:56
; Function Attrs: inaccessiblememonly mustprogress noinline willreturn
define internal fastcc { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* } @augmented_julia_bcast_786({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1) unnamed_addr #7 !dbg !94 {
top:
  %2 = alloca { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* }, align 8
  %3 = getelementptr inbounds { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* }, { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* }* %2, i32 0, i32 0
  %4 = getelementptr { {} addrspace(10)*, {} addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)* }* %3, i64 0, i32 0
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140599273005064 to {}*) to {} addrspace(10)*), {} addrspace(10)** %4, align 8
  %5 = getelementptr { {} addrspace(10)*, {} addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)* }* %3, i64 0, i32 1
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140599273005064 to {}*) to {} addrspace(10)*), {} addrspace(10)** %5, align 8
  %6 = call {}*** @julia.get_pgcstack() #9
  %7 = and i8 %1, 1, !dbg !95
  %.not = icmp eq i8 %7, 0, !dbg !95
  call void inttoptr (i64 140598298409856 to void (i8*)*)(i8* getelementptr inbounds ([1078 x i8], [1078 x i8]* @0, i32 0, i32 0)), !dbg !95
  br i1 %.not, label %common.ret, label %L2, !dbg !95
common.ret:                                       ; preds = %L2, %top
  %8 = phi {} addrspace(10)* [ %13, %L2 ], [ %0, %top ]
  %common.ret.op = phi {} addrspace(10)* [ %25, %L2 ], [ %0, %top ]
  %9 = insertvalue { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* } undef, {} addrspace(10)* %common.ret.op, 1, !dbg !95
  %10 = getelementptr inbounds { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* }, { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* }* %2, i32 0, i32 1, !dbg !95
  store {} addrspace(10)* %common.ret.op, {} addrspace(10)** %10, align 8, !dbg !95
  %11 = getelementptr inbounds { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* }, { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* }* %2, i32 0, i32 2, !dbg !95
  store {} addrspace(10)* %8, {} addrspace(10)** %11, align 8, !dbg !95
  %12 = load { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* }, { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* }* %2, align 8, !dbg !95
  ret { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* } %12, !dbg !95
L2:                                               ; preds = %top
  %13 = call {} addrspace(10)* @ijl_array_copy({} addrspace(10)* %0), !dbg !96
  %14 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)* }* %3, i32 0, i32 0, !dbg !96
  store {} addrspace(10)* %13, {} addrspace(10)** %14, align 8, !dbg !96
  %15 = bitcast {} addrspace(10)* %0 to <{ i8 addrspace(13)*, i64, i16, i16, i32, i64 }> addrspace(10)*, !dbg !96
  %16 = getelementptr inbounds <{ i8 addrspace(13)*, i64, i16, i16, i32, i64 }>, <{ i8 addrspace(13)*, i64, i16, i16, i32, i64 }> addrspace(10)* %15, i32 0, i32 3, !dbg !96
  %17 = load i16, i16 addrspace(10)* %16, align 2, !dbg !96
  %18 = zext i16 %17 to i64, !dbg !96
  %19 = bitcast {} addrspace(10)* %0 to <{ i8 addrspace(13)*, i64, i16, i16, i32, i64 }> addrspace(10)*, !dbg !96
  %20 = getelementptr inbounds <{ i8 addrspace(13)*, i64, i16, i16, i32, i64 }>, <{ i8 addrspace(13)*, i64, i16, i16, i32, i64 }> addrspace(10)* %19, i32 0, i32 1, !dbg !96
  %21 = load i64, i64 addrspace(10)* %20, align 8, !dbg !96
  %22 = mul i64 %21, %18, !dbg !96
  %23 = bitcast {} addrspace(10)* %13 to i8 addrspace(13)* addrspace(10)*, !dbg !96
  %24 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(10)* %23, align 8, !dbg !96
  call void @llvm.memset.p13i8.i64(i8 addrspace(13)* %24, i8 0, i64 %22, i1 false), !dbg !96
  %25 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !96
  %26 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)* }* %3, i32 0, i32 1
  store {} addrspace(10)* %25, {} addrspace(10)** %26, align 8
  br label %common.ret
}
in new function diffejulia_bcast_786 constant arg {} addrspace(10)* %0
in new function diffejulia_bcast_786 constant arg i8 %1
forced inactive   %2 = call {}*** @julia.get_pgcstack() #9
forced inactive val   %2 = call {}*** @julia.get_pgcstack() #9
  %2 = call {}*** @julia.get_pgcstack() #9 cv=1 ci=1
checking if is constant[3]   %3 = and i8 %1, 1, !dbg !11
 constant instruction from known non-float non-writing instruction   %3 = and i8 %1, 1, !dbg !11
 Value const as integral 3   %3 = and i8 %1, 1, !dbg !11 Integer
  %3 = and i8 %1, 1, !dbg !11 cv=1 ci=1
checking if is constant[3]   %.not = icmp eq i8 %3, 0, !dbg !11
 constant instruction from known non-float non-writing instruction   %.not = icmp eq i8 %3, 0, !dbg !11
 Value const as integral 3   %.not = icmp eq i8 %3, 0, !dbg !11 Integer
  %.not = icmp eq i8 %3, 0, !dbg !11 cv=1 ci=1
  br i1 %.not, label %common.ret, label %L2, !dbg !11 cv=1 ci=1
checking if is constant[3]   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 constant instruction from known non-float non-writing instruction   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 <Potential Pointer assumed active at 1>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 < UPSEARCH1>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
nonconstant(1)  up-inst   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] op   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 < MEMSEARCH3>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 < UPSEARCH1>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
nonconstant(1)  up-inst   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] op   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 cannot show constant instruction hypothesis:   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 </MEMSEARCH3>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] potentiallyActiveLoad=0x0 potentiallyActiveStore=0x0 potentialStore=0
 < UPSEARCH1>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
nonconstant(1)  up-inst   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] op   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 <ASOR2 ignoreStoresinto=1>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 </ASOR2 ignoreStoresInto=1> active from-ret>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 @@MEMSEARCH3>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] potentiallyActiveLoad=0x0 potentialStore=0 ActiveUp=1 ActiveDown=1 ActiveMemory=1
  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] cv=0 ci=1
  ret {} addrspace(10)* %common.ret.op, !dbg !11 cv=1 ci=1
checking if is constant[3]   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 < UPSEARCH1>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
constant(1)  up-call:  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 constant instruction from origin instruction   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 <Value USESEARCH2>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 UA=OnlyLoads
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 -   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 -   ret {} addrspace(10)* %common.ret.op, !dbg !11
 <Value USESEARCH2>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 UA=OnlyNonPointerStores
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 -   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 -   ret {} addrspace(10)* %common.ret.op, !dbg !11
 <Value USESEARCH2>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 UA=AllStores
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 -   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 -   ret {} addrspace(10)* %common.ret.op, !dbg !11
 <Value USESEARCH2>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 UA=None
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 -   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
      considering use of   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 -   ret {} addrspace(10)* %common.ret.op, !dbg !11
 < MEMSEARCH3>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 < UPSEARCH1>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
constant(1)  up-call:  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 constant instruction hypothesis:   %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 </MEMSEARCH3>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 potentiallyActiveLoad=0x0 potentiallyActiveStore=0x0 potentialStore=0
 < UPSEARCH1>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
constant(1)  up-call:  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 <ASOR2 ignoreStoresinto=1>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12
 <ASOR2 ignoreStoresinto=1>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 </ASOR2 ignoreStoresInto=1> active from-ret>  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 </ASOR2 ignoreStoresInto=1 active from-unknown>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 - use=  %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ]
 @@MEMSEARCH3>  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 potentiallyActiveLoad=0x0 potentialStore=0 ActiveUp=0 ActiveDown=1 ActiveMemory=1
  %4 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %0) #10, !dbg !12 cv=0 ci=1
  br label %common.ret cv=1 ci=1
; Function Attrs: inaccessiblememonly mustprogress noinline willreturn
define internal fastcc void @diffejulia_bcast_786({} addrspace(10)* align 16 dereferenceable(40) %0, i8 zeroext %1, { {} addrspace(10)*, {} addrspace(10)* } %tapeArg) unnamed_addr #7 !dbg !97 {
top:
  %2 = call {}*** @julia.get_pgcstack() #9
  %3 = and i8 %1, 1, !dbg !98
  %.not = icmp eq i8 %3, 0, !dbg !98
  br i1 %.not, label %common.ret, label %L2, !dbg !98
common.ret:                                       ; preds = %L2, %top
  br label %invertcommon.ret, !dbg !98
L2:                                               ; preds = %top
  %"'ip_phi" = extractvalue { {} addrspace(10)*, {} addrspace(10)* } %tapeArg, 0, !dbg !99
  %4 = extractvalue { {} addrspace(10)*, {} addrspace(10)* } %tapeArg, 1, !dbg !99
  br label %common.ret
inverttop:                                        ; preds = %invertL2, %invertcommon.ret
  ret void
invertcommon.ret:                                 ; preds = %common.ret
  br i1 %.not, label %inverttop, label %invertL2
invertL2:                                         ; preds = %invertcommon.ret
  br label %inverttop
}
; Function Attrs: mustprogress willreturn
define internal void @diffejulia_func_780_inner.1({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, i8 zeroext %1, double %differeturn) local_unnamed_addr #5 !dbg !55 {
entry:
  %"'de" = alloca double, align 8
  %2 = getelementptr double, double* %"'de", i64 0
  store double 0.000000e+00, double* %2, align 8
  %3 = call {}*** @julia.get_pgcstack() #9
  %_augmented = call fastcc { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* } @augmented_julia_bcast_786({} addrspace(10)* align 16 %0, i8 zeroext %1), !dbg !56
  %subcache = extractvalue { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* } %_augmented, 0, !dbg !56
  %4 = extractvalue { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* } %_augmented, 1, !dbg !56
  %"'ac" = extractvalue { { {} addrspace(10)*, {} addrspace(10)* }, {} addrspace(10)*, {} addrspace(10)* } %_augmented, 2, !dbg !56
  br label %invertentry, !dbg !58
invertentry:                                      ; preds = %entry
  store double %differeturn, double* %"'de", align 8
  %5 = load double, double* %"'de", align 8, !dbg !59
  call fastcc void @diffejulia_baz_784({} addrspace(10)* nocapture nofree readonly align 16 %4, {} addrspace(10)* nocapture nofree align 16 %"'ac", double %5), !dbg !59
  store double 0.000000e+00, double* %"'de", align 8, !dbg !59
  call fastcc void @diffejulia_bcast_786({} addrspace(10)* align 16 %0, i8 zeroext %1, { {} addrspace(10)*, {} addrspace(10)* } %subcache), !dbg !56
  ret void
}
ERROR: Enzyme execution failed.
Mismatched activity for:   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] const val: {} addrspace(10)* %0
Type tree: {[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@double, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}
You may be using a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/#Activity-of-temporary-storage). If not, please open an issue, and either rewrite this variable to not be conditionally active or use Enzyme.API.runtimeActivity!(true) as a workaround for now
Stacktrace:
 [1] bcast
   @ ./REPL[5]:2
Stacktrace:
 [1] throwerr(cstr::Cstring)
   @ Enzyme.Compiler ~/.julia/packages/Enzyme/gS4lp/src/compiler.jl:2924
 [2] macro expansion
   @ ~/.julia/packages/Enzyme/gS4lp/src/compiler.jl:9559 [inlined]
 [3] enzyme_call
   @ ~/.julia/packages/Enzyme/gS4lp/src/compiler.jl:9247 [inlined]
 [4] CombinedAdjointThunk
   @ ~/.julia/packages/Enzyme/gS4lp/src/compiler.jl:9210 [inlined]
 [5] autodiff
   @ ~/.julia/packages/Enzyme/gS4lp/src/Enzyme.jl:205 [inlined]
 [6] autodiff
   @ ~/.julia/packages/Enzyme/gS4lp/src/Enzyme.jl:228 [inlined]
 [7] autodiff(::EnzymeCore.ReverseMode{false}, ::typeof(func), ::Const{Vector{Float64}}, ::Const{Bool})
   @ Enzyme ~/.julia/packages/Enzyme/gS4lp/src/Enzyme.jl:214
 [8] top-level scope
   @ REPL[11]:1
                                    
                                    
                                    
                                
Did you try the suggestion at the end of the error?
Mismatched activity for:   %common.ret.op = phi {} addrspace(10)* [ %4, %L2 ], [ %0, %top ] const val: {} addrspace(10)* %0
Type tree: {[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@double, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}
You may be using a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/#Activity-of-temporary-storage). If not, please open an issue, and either rewrite this variable to not be conditionally active or use Enzyme.API.runtimeActivity!(true) as a workaround for now
Also it looks like you set printActivity!(true) at one point [hence all the previous prints]
I just copy-pasted your MWE in your last post.
I.e. I ran the code in https://github.com/EnzymeAD/Enzyme.jl/issues/700#issuecomment-1542847276
IIUC, if I need to change the MWE then it would no longer be valid, would it?