AutoPreallocation.jl
AutoPreallocation.jl copied to clipboard
Why is this allocating at all?
There is no real reason this code should allocate AFIACT.
I think something is going wrong with Cassette.
There is no real reason this code should allocate AFIACT.
What code? Did you mean to post an example?
Any code using avoid_allocations.
E.g. example from the readme.
julia> using AutoPreallocation, BenchmarkTools
julia> foo() = ones(1, 2096) * ones(2096, 1024) * ones(1024,1)
foo (generic function with 1 method)
julia> const foo_res, foo_record = record_alloctions(foo);
julia> @btime avoid_alloctions($foo_record, foo)
1.376 ms (29 allocations: 672 bytes)
1×1 Array{Float64,2}:
2.146304e6
So the location some of the allocations come from is in creating the Context, (thanks @vchuravy). But it is only a small minority of them AFAICT
julia> @btime AutoPreallocation.new_replay_ctx($foo_record)
14.621 ns (3 allocations: 64 bytes)
It doesn't do it in 1.2, only 1.3+ I believe this is https://github.com/jrevels/Cassette.jl/issues/153
In 1.3:
julia> @btime avoid_allocations($record, f_matmul)
2.012 μs (15 allocations: 352 bytes)
in 1.2:
julia> @btime avoid_allocations($record, f_matmul);
1.317 μs (3 allocations: 64 bytes)
64 is just the cost of creating the context.
This was not fixed by https://github.com/jrevels/Cassette.jl/pull/166 but I am going to guess it is something similar. Will need to dig deeper
I think this is related to all the splatting that happens in the definition of overdub. I tried tweaking https://github.com/jrevels/Cassette.jl/blob/master/src/overdub.jl#L524 to
using SpecializeVarargs
@specialize_vararg 5 recurse(ctx::Context, ::typeof(Core._apply), f, args...) = Core._apply(recurse, (ctx, f), args...)
and I find that the allocations in
using AutoPreallocation, BenchmarkTools
foo() = ones(1, 2096) * ones(2096, 1024) * ones(1024,1)
let
foo_res, foo_record = record_allocations(foo)
@btime avoid_allocations($foo_record, $foo)
end
goes from 7 allocations: 192 bytes to 6 allocations: 176 bytes, which suggests I might be on the right track. I tried for a while to manually add more methods to overdub so it doesn't rely as heavily on varargs, but I couldn't figure out how to do it correctly.
To clarify what I mean, I think that if overdub were defined with methods
overdub(OVERDUB_CONTEXT_NAME::Context)
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1) where {T1}
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1, arg2::T2) where {T1, T2}
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1, arg2::T2, arg3::T3) where {T1, T2, T3}
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1, arg2::T2, arg3::T3, arg4::T4) where {T1, T2, T3, T4}
instead of just
overdub(OVERDUB_CONTEXT_NAME::Context, args...)
the allocations might be avoided here. The explicit specializations on ::T1, ... ::T4 are important here for forcing the compiler to infer things.