Diffractor.jl
Diffractor.jl copied to clipboard
Accumulation
trafficstars
At present Diffractor may return thunks, but doesn't seem use them (or anything else) to accumulate effciently:
julia> @btime gradient(x -> sum(x), $(rand(100, 100))) |> first |> typeof
min 1.829 μs, mean 1.869 μs (8 allocations, 368 bytes)
ChainRulesCore.InplaceableThunk{ChainRulesCore.Thunk...
julia> @btime gradient(x -> sum(x) + sum(x), $(rand(100, 100))) |> first |> typeof
min 8.917 μs, mean 28.032 μs (22 allocations, 235.03 KiB)
Matrix{Float64} (alias for Array{Float64, 2})
julia> @btime copy($(rand(100, 100)));
min 1.262 μs, mean 4.120 μs (2 allocations, 78.17 KiB)
julia> 235.03 / 78.17
3.00665216835103
Should this change, perhaps to use ChainRulesCore.add!!? In which case it might it be easiest to change now, while there is nothing downstream to break.
Note aside that add!! is slower than expected here, 3 copies not 1:
julia> g1 = gradient(x -> sum(x), (rand(100, 100)))[1]; g2 = deepcopy(g1);
julia> @btime ChainRulesCore.add!!($(g1), $(g2));
min 4.719 μs, mean 21.300 μs (8 allocations, 234.55 KiB)
julia> 234.55 / 78.17
3.0005117052577717
julia> @btime ChainRulesCore.add!!($(randn(100, 100)), $(g2));
min 3.318 μs, mean 3.438 μs (0 allocations)
Or do we count on ImmutableArray and compiler improvements?
Xref https://github.com/FluxML/Zygote.jl/pull/981 which retrofits Zygote to accumulate in-place.
Xref also https://github.com/JuliaDiff/ChainRulesCore.jl/pull/539 which alters + on two thunks to be more efficient.