ChainRulesCore.jl icon indicating copy to clipboard operation
ChainRulesCore.jl copied to clipboard

`add!!(:: InplaceableThunk, :: InplaceableThunk)` is inefficient

Open mcabbott opened this issue 3 years ago • 0 comments

Adding two InplaceableThunks via add!! should only need one copy, but:

julia> th = rrule(sum, rand(100,100))[2](1)[2]  # BTW this prints far far too much
InplaceableThunk(Thunk(ChainRules.var"#1429#1432"{Int64, Colon, Matrix{Float64}, ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ProjectTo{Float64, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}}(1, Colon(), [0.9748264488399347 0.6907201530198338 0.009082324160529454 0.6610457838209846 0.20972606525114523 0.8027735544678234 0.7982157008910139 0.44743377610088986 ...

julia> @btime add!!($th, $th);
  min 8.541 μs, mean 39.686 μs (12 allocations, 390.89 KiB)

julia> @btime copy($(rand(100, 100)));
  min 1.262 μs, mean 4.484 μs (2 allocations, 78.17 KiB)

julia> 390 / 78.17
4.989126263272355

julia> @btime $th + $th;
  min 8.458 μs, mean 39.683 μs (12 allocations, 390.89 KiB)

julia> @less add!!(th, th)

julia> @less th + th  # doesn't call add!!

Edit: It's even worse, unthunk alone is 2 copies, from the rrule for sum, but not from simple thunks:

julia> @btime unthunk($th);
  min 2.995 μs, mean 8.829 μs (5 allocations, 156.36 KiB)

julia> const cmat = rand(100,100);

julia> th2 = @thunk pi .* cmat;

julia> @btime unthunk($th2);
  min 1.725 μs, mean 3.311 μs (2 allocations, 78.17 KiB)

mcabbott avatar Jan 11 '22 18:01 mcabbott