Diffractor.jl icon indicating copy to clipboard operation
Diffractor.jl copied to clipboard

Accumulation

Open mcabbott opened this issue 3 years ago • 0 comments
trafficstars

At present Diffractor may return thunks, but doesn't seem use them (or anything else) to accumulate effciently:

julia> @btime gradient(x -> sum(x), $(rand(100, 100))) |> first |> typeof
  min 1.829 μs, mean 1.869 μs (8 allocations, 368 bytes)
ChainRulesCore.InplaceableThunk{ChainRulesCore.Thunk...

julia> @btime gradient(x -> sum(x) + sum(x), $(rand(100, 100))) |> first |> typeof
  min 8.917 μs, mean 28.032 μs (22 allocations, 235.03 KiB)
Matrix{Float64} (alias for Array{Float64, 2})

julia> @btime copy($(rand(100, 100)));
  min 1.262 μs, mean 4.120 μs (2 allocations, 78.17 KiB)

julia> 235.03 / 78.17
3.00665216835103

Should this change, perhaps to use ChainRulesCore.add!!? In which case it might it be easiest to change now, while there is nothing downstream to break.

Note aside that add!! is slower than expected here, 3 copies not 1:

julia> g1 = gradient(x -> sum(x), (rand(100, 100)))[1]; g2 = deepcopy(g1);

julia> @btime ChainRulesCore.add!!($(g1), $(g2));
  min 4.719 μs, mean 21.300 μs (8 allocations, 234.55 KiB)

julia> 234.55 / 78.17
3.0005117052577717

julia> @btime ChainRulesCore.add!!($(randn(100, 100)), $(g2));
  min 3.318 μs, mean 3.438 μs (0 allocations)

Or do we count on ImmutableArray and compiler improvements?

Xref https://github.com/FluxML/Zygote.jl/pull/981 which retrofits Zygote to accumulate in-place.

Xref also https://github.com/JuliaDiff/ChainRulesCore.jl/pull/539 which alters + on two thunks to be more efficient.

mcabbott avatar Jan 11 '22 17:01 mcabbott