Error on function with views and function which takes arrays of arrays
Hi, thanks for all your work on this package, and apologies in advance, its entirely likely I've messed up the autodiff call or am unaware of something that isn't supported yet.
Firstly, here are two examples based on views which error. In an earlier version p was a 4D array also indexed by the loop index, which produced a shorter error (with the difference in trace causing a different error between the two versions in that case), however, these current versions produce such a long error that the REPL/language server crashes.
using Enzyme
function contraction1(x, p, cache1, cache2)
cache1 .= @view(p[x[1]+1, :, :])
for i = 2:length(x)
mul!(cache2, @view(p[x[i]+1, :, :]), cache1)
cache1 .= cache2
end
trace = 0
for i in 1:size(cache1, 1)
trace += cache1[i, i]
end
return trace
end
L = 4
χ = 3
d = 2
x = rand(0:(d-1), L)
p = rand(d, χ, χ)
cache1 = zeros(χ, χ)
cache2 = zeros(χ, χ)
g = autodiff(
Reverse, contraction1, Const(x), Active(p),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
and
using Enzyme, LinearAlgebra
function contraction2(x, p, cache1, cache2)
cache1 .= @view(p[x[1]+1, :, :])
for i = 2:length(x)
mul!(cache2, @view(p[x[i]+1, :, :]), cache1)
cache1 .= cache2
end
return tr(cache1)
end
L = 4
χ = 3
d = 2
x = rand(0:(d-1), L)
p = rand(d, χ, χ)
cache1 = zeros(χ, χ)
cache2 = zeros(χ, χ)
g = autodiff(
Reverse, contraction2, Const(x), Active(p),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
My original goal was to actua;ly use a vector/matrix of matrices for p and the cache (which will be much more complicated in the case I'm aiming for). These versions (with / without call to tr) are below, with different errors.
using Enzyme
function contraction3(x, p, cache1, cache2)
cache1 .= p[x[1]+1]
for i = 2:length(x)
mul!(cache2, p[x[i]+1], cache1)
cache1 .= cache2
end
trace = 0
for i in 1:size(cache1, 1)
trace += cache1[i, i]
end
return trace
end
L = 4
χ = 3
d = 2
x = rand(0:(d-1), L)
p = [rand(χ, χ) for i=1:d]
cache1 = zeros(χ, χ)
cache2 = zeros(χ, χ)
g = autodiff(
Reverse, contraction3, Const(x), Active(p),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
Error:
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler C:\Users\domin\.julia\packages\GPUCompiler\jVY4I\src\utils.jl:35
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler C:\Users\domin\.julia\packages\GPUCompiler\jVY4I\src\utils.jl:35
ERROR: AssertionError: length(args) == length((collect(parameters(entry_f)))[1 + sret + returnRoots:end])
Stacktrace:
[1] lower_convention(functy::Type, mod::LLVM.Module, entry_f::LLVM.Function, actualRetType::Type)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:3698
[2] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction3), Tuple{Vector{Int64}, Vector{Matrix{Float64}}, Matrix{Float64}, Matrix{Float64}}}}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, ctx::LLVM.Context, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4123
[3] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction3), Tuple{Vector{Int64}, Vector{Matrix{Float64}}, Matrix{Float64}, Matrix{Float64}}}})
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4599
[4] cached_compilation(job::GPUCompiler.CompilerJob, key::UInt64, specid::UInt64)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4637
[5] #s565#115
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4697 [inlined]
[6] var"#s565#115"(F::Any, Fn::Any, DF::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, specid::Any, ReturnPrimal::Any, ::Any, #unused#::Type, f::Any, df::Any, #unused#::Type, tt::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
@ Enzyme.Compiler .\none:0
[7] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
@ Core .\boot.jl:580
[8] thunk
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4725 [inlined]
[9] thunk (repeats 2 times)
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4718 [inlined]
[10] autodiff(::Enzyme.ReverseMode, ::typeof(contraction3), ::Type{Const{Union{Float64, Int64}}}, ::Const{Vector{Int64}}, ::Vararg{Any})
@ Enzyme C:\Users\domin\.julia\packages\Enzyme\di3zM\src\Enzyme.jl:285
[11] autodiff(::Enzyme.ReverseMode, ::typeof(contraction3), ::Const{Vector{Int64}}, ::Active{Vector{Matrix{Float64}}}, ::Vararg{Any})
@ Enzyme C:\Users\domin\.julia\packages\Enzyme\di3zM\src\Enzyme.jl:319
[12] top-level scope
@ c:\Users\domin\Dropbox (Personal)\side_projects\AdaptiveTrajectorySampling\src\approximations\tensor_approx.jl:77
using Enzyme, LinearAlgebra
function contraction4(x, p, cache1, cache2)
cache1 .= p[x[1]+1]
for i = 2:length(x)
mul!(cache2, p[x[i]+1], cache1)
cache1 .= cache2
end
return tr(cache1)
end
L = 4
χ = 3
d = 2
x = rand(0:(d-1), L)
p = [rand(χ, χ) for i=1:d]
cache1 = zeros(χ, χ)
cache2 = zeros(χ, χ)
g = autodiff(
Reverse, contraction4, Const(x), Active(p),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
Error:
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler C:\Users\domin\.julia\packages\GPUCompiler\jVY4I\src\utils.jl:35
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler C:\Users\domin\.julia\packages\GPUCompiler\jVY4I\src\utils.jl:35
ERROR: Conversion of boxed type Vector{Matrix{Float64}} is not allowed
Stacktrace:
[1] error(s::String)
@ Base .\error.jl:33
[2] convert(::Type{LLVM.LLVMType}, typ::Type; ctx::LLVM.Context, allow_boxed::Bool)
@ LLVM.Interop C:\Users\domin\.julia\packages\LLVM\WjSQG\src\interop\base.jl:92
[3] create_abi_wrapper(enzymefn::LLVM.Function, F::Type, argtypes::Vector{DataType}, rettype::Type, actualRetType::Type, Mode::Enzyme.API.CDerivativeMode, augmented::Nothing, dupClosure::Bool, width::Int64, returnPrimal::Bool)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:3345
[4] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction4), Tuple{Vector{Int64}, Vector{Matrix{Float64}}, Matrix{Float64}, Matrix{Float64}}}}, mod::LLVM.Module, primalf::LLVM.Function, adjoint::GPUCompiler.FunctionSpec{typeof(contraction4), Tuple{Const{Vector{Int64}}, Active{Vector{Matrix{Float64}}}, Duplicated{Matrix{Float64}}, Duplicated{Matrix{Float64}}}}, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, dupClosure::Bool, wrap::Bool, modifiedBetween::Bool, returnPrimal::Bool)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:3278
[5] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction4), Tuple{Vector{Int64}, Vector{Matrix{Float64}}, Matrix{Float64}, Matrix{Float64}}}}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, ctx::LLVM.Context, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4158
[6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction4), Tuple{Vector{Int64}, Vector{Matrix{Float64}}, Matrix{Float64}, Matrix{Float64}}}})
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4599
[7] cached_compilation(job::GPUCompiler.CompilerJob, key::UInt64, specid::UInt64)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4637
[8] #s565#115
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4697 [inlined]
[9] var"#s565#115"(F::Any, Fn::Any, DF::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, specid::Any, ReturnPrimal::Any, ::Any, #unused#::Type, f::Any, df::Any, #unused#::Type, tt::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
@ Enzyme.Compiler .\none:0
[10] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
@ Core .\boot.jl:580
[11] thunk
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4725 [inlined]
[12] thunk (repeats 2 times)
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4718 [inlined]
[13] autodiff(::Enzyme.ReverseMode, ::typeof(contraction4), ::Type{Active{Float64}}, ::Const{Vector{Int64}}, ::Vararg{Any})
@ Enzyme C:\Users\domin\.julia\packages\Enzyme\di3zM\src\Enzyme.jl:285
[14] autodiff(::Enzyme.ReverseMode, ::typeof(contraction4), ::Const{Vector{Int64}}, ::Active{Vector{Matrix{Float64}}}, ::Vararg{Any})
@ Enzyme C:\Users\domin\.julia\packages\Enzyme\di3zM\src\Enzyme.jl:319
[15] top-level scope
@ c:\Users\domin\Dropbox (Personal)\side_projects\AdaptiveTrajectorySampling\src\approximations\tensor_approx.jl:82
Finally, a version which removes views and arrays of arrays, which seems to find a boxed matrix.
using Enzyme, LinearAlgebra
L = 4
χ = 3
d = 2
x = rand(0:(d-1), L)
p = [rand(χ, χ) for i=1:d]
cache1 = zeros(χ, χ)
cache2 = zeros(χ, χ)
function contraction5(x, p1, p2, cache1, cache2)
if x[1] == 0
cache1 .= p1
else
cache1 .= p2
end
for i = 2:length(x)
if x[i] == 0
mul!(cache2, p1, cache1)
else
mul!(cache2, p2, cache1)
end
cache1 .= cache2
end
return tr(cache1)
end
L = 4
χ = 3
d = 2
x = rand(0:(d-1), L)
p1 = rand(χ, χ)
p2 = rand(χ, χ)
cache1 = zeros(χ, χ)
cache2 = zeros(χ, χ)
g = autodiff(
Reverse, contraction5, Const(x), Active(p1), Active(p2),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
Error
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler C:\Users\domin\.julia\packages\GPUCompiler\jVY4I\src\utils.jl:35
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler C:\Users\domin\.julia\packages\GPUCompiler\jVY4I\src\utils.jl:35
ERROR: Conversion of boxed type Matrix{Float64} is not allowed
Stacktrace:
[1] error(s::String)
@ Base .\error.jl:33
[2] convert(::Type{LLVM.LLVMType}, typ::Type; ctx::LLVM.Context, allow_boxed::Bool)
@ LLVM.Interop C:\Users\domin\.julia\packages\LLVM\WjSQG\src\interop\base.jl:92
[3] create_abi_wrapper(enzymefn::LLVM.Function, F::Type, argtypes::Vector{DataType}, rettype::Type, actualRetType::Type, Mode::Enzyme.API.CDerivativeMode, augmented::Nothing, dupClosure::Bool, width::Int64, returnPrimal::Bool)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:3345
[4] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction5), Tuple{Vector{Int64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}}}}, mod::LLVM.Module, primalf::LLVM.Function, adjoint::GPUCompiler.FunctionSpec{typeof(contraction5), Tuple{Const{Vector{Int64}}, Active{Matrix{Float64}}, Active{Matrix{Float64}}, Duplicated{Matrix{Float64}}, Duplicated{Matrix{Float64}}}}, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, dupClosure::Bool, wrap::Bool, modifiedBetween::Bool, returnPrimal::Bool)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:3278
[5] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction5), Tuple{Vector{Int64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}}}}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, ctx::LLVM.Context, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4158
[6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction5), Tuple{Vector{Int64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}}}})
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4599
[7] cached_compilation(job::GPUCompiler.CompilerJob, key::UInt64, specid::UInt64)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4637
[8] #s565#115
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4697 [inlined]
[9] var"#s565#115"(F::Any, Fn::Any, DF::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, specid::Any, ReturnPrimal::Any, ::Any, #unused#::Type, f::Any, df::Any, #unused#::Type, tt::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
@ Enzyme.Compiler .\none:0
[10] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
@ Core .\boot.jl:580
[11] thunk
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4725 [inlined]
[12] thunk (repeats 2 times)
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4718 [inlined]
[13] autodiff(::Enzyme.ReverseMode, ::typeof(contraction5), ::Type{Active{Float64}}, ::Const{Vector{Int64}}, ::Vararg{Any})
@ Enzyme C:\Users\domin\.julia\packages\Enzyme\di3zM\src\Enzyme.jl:285
[14] autodiff(::Enzyme.ReverseMode, ::typeof(contraction5), ::Const{Vector{Int64}}, ::Active{Matrix{Float64}}, ::Vararg{Any})
@ Enzyme C:\Users\domin\.julia\packages\Enzyme\di3zM\src\Enzyme.jl:319
[15] top-level scope
@ c:\Users\domin\Dropbox (Personal)\side_projects\AdaptiveTrajectorySampling\src\approximations\tensor_approx.jl:109
Version info:
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 4
On Enzyme v0.10.4.
Apologies for the long post! Hope it helps.
The problem here isn't the view of the array, but that since it's an array, it needs to be duplicated, not active. Try that and see what happens?
I had an example failing with a similar error to the first MWE here, but that appears to be fixed on main with the latest JLL. Unfortunately, changing x and p to Duplicated in the MWE here does not appear to help. I also added a return type activity annotation for good measure, but no luck:
g = autodiff(
Reverse, contraction1, Active, Duplicated(x, similar(x)), Duplicated(p, similar(p)),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
What is the error you are seeing, if it is the following this is a distinct (type union) issue:
Illegal updateAnalysis prev:{[-1]:Integer} new: {[-1]:Float@double}
val: %229 = bitcast i64 %.sroa.095.0267 to double, !dbg !403 origin= %.pn = select i1 %value_phi26.i269, double %230, double %229, !dbg !403
Caused by:
Stacktrace:
[1] contraction1
@ ./REPL[4]:9
[2] contraction1
@ ./REPL[4]:0
Stacktrace:
[1] julia_error(cstr::Cstring, val::Ptr{LLVM.API.LLVMOpaqueValue}, errtype::Enzyme.API.ErrorType, data::Ptr{Nothing})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:3061
[2] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{Enzyme.API.CDIFFE_TYPE}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, width::Int64, additionalArg::Ptr{Nothing}, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{Bool}, augmented::Ptr{Nothing}, atomicAdd::Bool)
@ Enzyme.API ~/git/Enzyme.jl/src/api.jl:118
[3] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction1), Tuple{Vector{Int64}, Array{Float64, 3}, Matrix{Float64}, Matrix{Float64}}}}, mod::LLVM.Module, primalf::LLVM.Function, adjoint::GPUCompiler.FunctionSpec{typeof(contraction1), Tuple{Duplicated{Vector{Int64}}, Duplicated{Array{Float64, 3}}, Duplicated{Matrix{Float64}}, Duplicated{Matrix{Float64}}}}, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, dupClosure::Bool, wrap::Bool, modifiedBetween::Bool, returnPrimal::Bool, jlrules::Vector{String})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:3875
[4] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction1), Tuple{Vector{Int64}, Array{Float64, 3}, Matrix{Float64}, Matrix{Float64}}}}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, ctx::LLVM.Context, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:4850
[5] _thunk
@ ~/git/Enzyme.jl/src/compiler.jl:5278 [inlined]
[6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction1), Tuple{Vector{Int64}, Array{Float64, 3}, Matrix{Float64}, Matrix{Float64}}}})
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:5272
[7] cached_compilation(job::GPUCompiler.CompilerJob, key::UInt64, specid::UInt64)
@ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:5316
[8] #s741#132
@ ~/git/Enzyme.jl/src/compiler.jl:5376 [inlined]
[9] var"#s741#132"(F::Any, Fn::Any, DF::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, specid::Any, ReturnPrimal::Any, ::Any, #unused#::Type, f::Any, df::Any, #unused#::Type, tt::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
@ Enzyme.Compiler ./none:0
[10] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
@ Core ./boot.jl:582
[11] thunk
@ ~/git/Enzyme.jl/src/compiler.jl:5404 [inlined]
[12] thunk (repeats 2 times)
@ ~/git/Enzyme.jl/src/compiler.jl:5397 [inlined]
[13] autodiff(::Enzyme.ReverseMode, ::typeof(contraction1), ::Type{Active}, ::Duplicated{Vector{Int64}}, ::Vararg{Any})
@ Enzyme ~/git/Enzyme.jl/src/Enzyme.jl:296
[14] top-level scope
@ REPL[12]:1
This error is caused by the fact that trace is a union of Float/Int. This should be fixable by changing trace = 0 to trace = 0.0
using Enzyme, LinearAlgebra
function contraction1(x, p, cache1, cache2)
cache1 .= @view(p[x[1]+1, :, :])
for i = 2:length(x)
mul!(cache2, @view(p[x[i]+1, :, :]), cache1)
cache1 .= cache2
end
trace = 0.0
for i in 1:size(cache1, 1)
trace += cache1[i, i]
end
return trace
end
L = 4
χ = 3
d = 2
x = rand(0:(d-1), L)
p = rand(d, χ, χ)
cache1 = zeros(χ, χ)
cache2 = zeros(χ, χ)
g = autodiff(
Reverse, contraction1, Active, Duplicated(x, similar(x)), Duplicated(p, similar(p)),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
On latest main this does succeed.
Apologies for the slow response, and missing the need for duplicated! Good spot on the union also.
Bearing in mind I'm on julia 1.7 and the current release of Enzyme rather than main, I'll post what I'm currently seeing below anyway, and will have a go with more up to date versions later.
So, changing the autodiff calls to have duplicated p's and cache's, I'm seeing:
contraction1
Leaving trace as an Int to start with, the error I get is
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler C:\Users\domin\.julia\packages\GPUCompiler\jVY4I\src\utils.jl:35
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler C:\Users\domin\.julia\packages\GPUCompiler\jVY4I\src\utils.jl:35
ERROR: AssertionError: length(args) == length((collect(parameters(entry_f)))[1 + sret + returnRoots:end])
Stacktrace:
[1] lower_convention(functy::Type, mod::LLVM.Module, entry_f::LLVM.Function, actualRetType::Type)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:3698
[2] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction1), Tuple{Vector{Int64}, Array{Float64, 3}, Matrix{Float64}, Matrix{Float64}}}}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, ctx::LLVM.Context, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4123
[3] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(contraction1), Tuple{Vector{Int64}, Array{Float64, 3}, Matrix{Float64}, Matrix{Float64}}}})
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4599
[4] cached_compilation(job::GPUCompiler.CompilerJob, key::UInt64, specid::UInt64)
@ Enzyme.Compiler C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4637
[5] #s565#115
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4697 [inlined]
[6] var"#s565#115"(F::Any, Fn::Any, DF::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, specid::Any, ReturnPrimal::Any, ::Any, #unused#::Type, f::Any, df::Any, #unused#::Type, tt::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
@ Enzyme.Compiler .\none:0
[7] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
@ Core .\boot.jl:580
[8] thunk
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4725 [inlined]
[9] thunk (repeats 2 times)
@ C:\Users\domin\.julia\packages\Enzyme\di3zM\src\compiler.jl:4718 [inlined]
[10] autodiff(::Enzyme.ReverseMode, ::typeof(contraction1), ::Type{Active}, ::Const{Vector{Int64}}, ::Vararg{Any})
@ Enzyme C:\Users\domin\.julia\packages\Enzyme\di3zM\src\Enzyme.jl:285
[11] top-level scope
@ c:\Users\domin\Dropbox (Personal)\side_projects\AdaptiveTrajectorySampling\src\approximations\tensor_approx.jl:61
If I remove the union by initializing trace as 0.0, it crashes the REPL and I'm not sure how to recover anything from that.
I see the same error in the union case if I also duplicate x, and the REPL also crashes in the no-union case if I duplicate x.
contraction2
This also crashes the REPL, I suspect for the same reasons.
contraction3
Works for small matrices with trace=0.0. With the union I see the same error as for the union case of contraction1.
contraction4
Works for small matrices.
contraction5
Works for small matrices.
Increasing / varying chi
However, I have noticed that if I increase chi (from e.g. 3 to 16), the results appear incorrect (or, at least, they disagree with zygote for a non-mutating function that I believe does the same thing). Its also quite slow compared to zygote for larger matrices, but I'm not sure if that is a primary concern right now, and guess that could be related to the BLAS fallback warnings.
For context, the function I'm using with Zygote is
function apply(x, p)
return tr(mapreduce(xs -> p[xs+1], *, reverse(x)))
end
Finally, I've noticed that varying chi within a session seems to break the autodiff, with it sometimes returning NaNs or completely random and extremely large numbers.
I'll give this a go on 1.8 and main later.
Can confirm on julia 1.8 and latest main with the new jll that all versions don't error when trace isn't a union. They also appear to be able to produce correct results (assuming Zygote is correct) at larger chi. However, I'm still seeing somewhat inconsistent behaviour, with the derivatives sometimes being full of NaNs or numbers on the order of 10^300. Every version of the function produces the warnings
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-w64-windows-gnu' whereas 'text' is 'x86_64-w64-mingw32'
┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler C:\Users\domin\.julia\packages\GPUCompiler\jVY4I\src\utils.jl:35
warning: didn't implement memmove, using memcpy as fallback which can result in errors
This last one in particular, could this be causing the random incorrect results?
Can you post the specific version of the code that creates incorrect result?
The code I'm running is:
using Enzyme
using LinearAlgebra
function contraction1(x, p, cache1, cache2)
cache1 .= @view(p[x[1]+1, :, :])
for i = 2:length(x)
mul!(cache2, @view(p[x[i]+1, :, :]), cache1)
cache1 .= cache2
end
trace = 0.0
for i in 1:size(cache1, 1)
trace += cache1[i, i]
end
return trace
end
function contraction2(x, p, cache1, cache2)
cache1 .= @view(p[x[1]+1, :, :])
for i = 2:length(x)
mul!(cache2, @view(p[x[i]+1, :, :]), cache1)
cache1 .= cache2
end
return tr(cache1)
end
function contraction3(x, p, cache1, cache2)
cache1 .= p[x[1]+1]
for i = 2:length(x)
mul!(cache2, p[x[i]+1], cache1)
cache1 .= cache2
end
trace = 0.0
for i in 1:size(cache1, 1)
trace += cache1[i, i]
end
return trace
end
function contraction4(x, p, cache1, cache2)
cache1 .= p[x[1]+1]
for i = 2:length(x)
mul!(cache2, p[x[i]+1], cache1)
cache1 .= cache2
end
return tr(cache1)
end
function contraction5(x, p1, p2, cache1, cache2)
if x[1] == 0
cache1 .= p1
else
cache1 .= p2
end
for i = 2:length(x)
if x[i] == 0
mul!(cache2, p1, cache1)
else
mul!(cache2, p2, cache1)
end
cache1 .= cache2
end
return tr(cache1)
end
L = 4
χ = 16
d = 2
x = rand(0:(d-1), L)
p = randn(d, χ, χ) .* 0.25
p2 = [p[i, :, :] for i = 1:d]
p31 = p2[1]
p32 = p2[2]
cache1 = zeros(χ, χ)
cache2 = zeros(χ, χ)
begin
dfdp1 = zero(p)
g = autodiff(
Reverse, contraction1, Const(x),
Duplicated(p, dfdp1),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
display(dfdp1)
end
begin
dfdp1 = zero(p)
g = autodiff(
Reverse, contraction2, Const(x),
Duplicated(p, dfdp1),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
display(dfdp1)
end
begin
dfdp2 = zero.(p2)
g = autodiff(
Reverse, contraction3, Const(x),
Duplicated(p2, dfdp2),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
display(dfdp2)
end
begin
dfdp2 = zero.(p2)
g = autodiff(
Reverse, contraction4, Const(x),
Duplicated(p2, dfdp2),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
display(dfdp2)
end
begin
dfdp31 = zero(p31)
dfdp32 = zero(p32)
g = autodiff(
Reverse, contraction5, Const(x),
Duplicated(p31, dfdp31),
Duplicated(p32, dfdp32),
Duplicated(cache1, similar(cache1)),
Duplicated(cache2, similar(cache2))
)
display(dfdp31)
display(dfdp32)
end
As I rerun these begin-end blocks in the REPL repeatedly (in the vscode integrated terminal), while I begin by seeing correct results, I regularly see arrays full of NaNs, or numbers up at the floating point limit. After running the blocks enough times the results even seemed to become random but finite. Adding Active for the return or duplicating x doesn't seem to help.
Version info for the PC I just ran this on:
Julia Version 1.8.0
Commit 5544a0fab7 (2022-08-17 13:38 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 16 × 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, rocketlake)
Threads: 8 on 16 virtual cores
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 8
JULIA_PKG_SERVER = .
So it just occured to me that this could originate from using similar instead of zero for the cache duplication, since that is somewhat random and can produce NaNs which could presumably propagate. Indeed, replacing all the similars with zero seems to make it deterministic and correct!
Yes for reverse mode you, the derivative is +='d into, so you need to zero initialize.
Closing for now since it seems like this is resolved, reopen if not.
Since this is still open, I would comment that somewhere around 0.10.6 the ones of these with views broke again, however they all appear to be working on 0.10.11.
The results are quite slow with larger matrices compared with e.g. Zygote (about a factor of 8 for 16x16 matrices, factor of 100 for 64x64 matrices), but I'm guessing that will be fixed when the BLAS support is improved, so I think its safe to close this.
Can you post the benchmark code?
Sure, for Enzyme I'm doing:
using BenchmarkTools
using Enzyme
using LinearAlgebra
function contraction1(x, p, cache1, cache2)
cache1 .= @view(p[:, :, x[1]+1])
for i = 2:length(x)
mul!(cache2, @view(p[:, :, x[i]+1]), cache1)
cache1 .= cache2
end
trace = 0.0
for i in 1:size(cache1, 1)
trace += cache1[i, i]
end
return trace
end
function contraction2(x, p, cache1, cache2)
cache1 .= @view(p[:, :, x[1]+1])
for i = 2:length(x)
mul!(cache2, @view(p[:, :, x[i]+1]), cache1)
cache1 .= cache2
end
return tr(cache1)
end
function contraction3(x, p, cache1, cache2)
cache1 .= p[x[1]+1]
for i = 2:length(x)
mul!(cache2, p[x[i]+1], cache1)
cache1 .= cache2
end
trace = 0.0
for i in 1:size(cache1, 1)
trace += cache1[i, i]
end
return trace
end
function contraction4(x, p, cache1, cache2)
cache1 .= p[x[1]+1]
for i = 2:length(x)
mul!(cache2, p[x[i]+1], cache1)
cache1 .= cache2
end
return tr(cache1)
end
function contraction5(x, p1, p2, cache1, cache2)
if x[1] == 0
cache1 .= p1
else
cache1 .= p2
end
for i = 2:length(x)
if x[i] == 0
mul!(cache2, p1, cache1)
else
mul!(cache2, p2, cache1)
end
cache1 .= cache2
end
return tr(cache1)
end
L = 4
χ = 16
d = 2
x = rand(0:(d-1), L)
p = randn(χ, χ, d) .* 0.25
p2 = [p[:, :, i] for i = 1:d]
p31 = p2[1]
p32 = p2[2]
cache1 = zeros(χ, χ)
cache2 = zeros(χ, χ)
begin
dfdp1 = zero(p)
@btime autodiff(
Reverse, contraction1, Const(x),
Duplicated(p, dfdp1),
Duplicated(cache1, zero(cache1)),
Duplicated(cache2, zero(cache2))
)
# χ = 16: 128.400 μs (74 allocations: 13.72 KiB)
# χ = 64: 8.471 ms (76 allocations: 73.56 KiB)
end
begin
dfdp1 = zero(p)
@btime autodiff(
Reverse, contraction2, Const(x),
Duplicated(p, dfdp1),
Duplicated(cache1, zero(cache1)),
Duplicated(cache2, zero(cache2))
)
# χ = 16: 128.000 μs (74 allocations: 13.72 KiB)
# χ = 64: 8.478 ms (76 allocations: 73.56 KiB)
end
begin
dfdp2 = zero.(p2)
@btime autodiff(
Reverse, contraction3, Const(x),
Duplicated(p2, dfdp2),
Duplicated(cache1, zero(cache1)),
Duplicated(cache2, zero(cache2))
)
# χ = 16: 123.100 μs (75 allocations: 8.09 KiB)
# χ = 64: 8.295 ms (77 allocations: 67.94 KiB)
end
begin
dfdp2 = zero.(p2)
@btime autodiff(
Reverse, contraction4, Const(x),
Duplicated(p2, dfdp2),
Duplicated(cache1, zero(cache1)),
Duplicated(cache2, zero(cache2))
)
# χ = 16: 123.000 μs (75 allocations: 8.09 KiB)
# χ = 64: 8.288 ms (77 allocations: 67.94 KiB)
end
begin
dfdp31 = zero(p31)
dfdp32 = zero(p32)
@btime autodiff(
Reverse, contraction5, Const(x),
Duplicated(p31, dfdp31),
Duplicated(p32, dfdp32),
Duplicated(cache1, zero(cache1)),
Duplicated(cache2, zero(cache2))
)
# χ = 16: 123.400 μs (85 allocations: 8.52 KiB)
# χ = 64: 8.289 ms (87 allocations: 68.36 KiB)
end
wrapping the autodiff calls / cache duplication in a function definition reduces allocations but doesn't seem to change the timings.
While for Zygote I'm doing
function apply(x, p)
return tr(mapreduce(xs -> p[xs+1], *, reverse(x)))
end
function apply2(x, p)
tmp = p[x[1]+1]
for i = 2:length(x)
tmp = p[x[i]+1] * tmp
end
return tr(tmp)
end
using Zygote
@btime Zygote.gradient(apply, $x, $p2)
# χ = 16: 27.800 μs (393 allocations: 38.53 KiB)
# χ = 64: 96.500 μs (383 allocations: 365.78 KiB)
@btime Zygote.gradient(apply2, $x, $p2)
# χ = 16: 15.900 μs (245 allocations: 33.75 KiB)
# χ = 64: 82.100 μs (234 allocations: 360.16 KiB)
with the same p2.
My target use case is actually something more complex that ends up with a lot of overhead in Zygote (and seems quite difficult to write without mutation, or the mutation-like syntax in JAX), and therefore ends up being quite slow compared to an equivalent JAX implementation.
(Specifically, I'm aiming for a Julia implementation of the policy function here https://github.com/RL-with-TNs/acten_code/blob/aa00f626fe37397b724108caaa95c08e2f13ce5b/ACTeN/src/approximations/policy_mps_east.py#L7, which then needs to be differentiated with respect to the third argument.)
As an aside @vchuravy which is why I left this open. I don't think zero init'ing the temporary should be required.