ChainRulesCore.jl
ChainRulesCore.jl copied to clipboard
`add!!(:: InplaceableThunk, :: InplaceableThunk)` is inefficient
Adding two InplaceableThunks via add!! should only need one copy, but:
julia> th = rrule(sum, rand(100,100))[2](1)[2] # BTW this prints far far too much
InplaceableThunk(Thunk(ChainRules.var"#1429#1432"{Int64, Colon, Matrix{Float64}, ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ProjectTo{Float64, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}}(1, Colon(), [0.9748264488399347 0.6907201530198338 0.009082324160529454 0.6610457838209846 0.20972606525114523 0.8027735544678234 0.7982157008910139 0.44743377610088986 ...
julia> @btime add!!($th, $th);
min 8.541 μs, mean 39.686 μs (12 allocations, 390.89 KiB)
julia> @btime copy($(rand(100, 100)));
min 1.262 μs, mean 4.484 μs (2 allocations, 78.17 KiB)
julia> 390 / 78.17
4.989126263272355
julia> @btime $th + $th;
min 8.458 μs, mean 39.683 μs (12 allocations, 390.89 KiB)
julia> @less add!!(th, th)
julia> @less th + th # doesn't call add!!
Edit: It's even worse, unthunk alone is 2 copies, from the rrule for sum, but not from simple thunks:
julia> @btime unthunk($th);
min 2.995 μs, mean 8.829 μs (5 allocations, 156.36 KiB)
julia> const cmat = rand(100,100);
julia> th2 = @thunk pi .* cmat;
julia> @btime unthunk($th2);
min 1.725 μs, mean 3.311 μs (2 allocations, 78.17 KiB)