AutoPreallocation.jl icon indicating copy to clipboard operation
AutoPreallocation.jl copied to clipboard

Why is this allocating at all?

Open oxinabox opened this issue 5 years ago • 7 comments

There is no real reason this code should allocate AFIACT.

I think something is going wrong with Cassette.

oxinabox avatar Feb 16 '20 18:02 oxinabox

There is no real reason this code should allocate AFIACT.

What code? Did you mean to post an example?

MasonProtter avatar Feb 16 '20 18:02 MasonProtter

Any code using avoid_allocations.

E.g. example from the readme.

julia> using AutoPreallocation, BenchmarkTools

julia> foo() = ones(1, 2096) * ones(2096, 1024) * ones(1024,1)
foo (generic function with 1 method)

julia> const foo_res, foo_record = record_alloctions(foo);

julia> @btime avoid_alloctions($foo_record, foo)
  1.376 ms (29 allocations: 672 bytes)
1×1 Array{Float64,2}:
 2.146304e6

oxinabox avatar Feb 16 '20 18:02 oxinabox

So the location some of the allocations come from is in creating the Context, (thanks @vchuravy). But it is only a small minority of them AFAICT

julia> @btime AutoPreallocation.new_replay_ctx($foo_record)
  14.621 ns (3 allocations: 64 bytes)

oxinabox avatar Feb 16 '20 19:02 oxinabox

It doesn't do it in 1.2, only 1.3+ I believe this is https://github.com/jrevels/Cassette.jl/issues/153

In 1.3:

julia> @btime avoid_allocations($record, f_matmul)
  2.012 μs (15 allocations: 352 bytes)

in 1.2:

julia> @btime avoid_allocations($record, f_matmul);
  1.317 μs (3 allocations: 64 bytes)

64 is just the cost of creating the context.

oxinabox avatar Feb 17 '20 10:02 oxinabox

This was not fixed by https://github.com/jrevels/Cassette.jl/pull/166 but I am going to guess it is something similar. Will need to dig deeper

oxinabox avatar Feb 18 '20 17:02 oxinabox

I think this is related to all the splatting that happens in the definition of overdub. I tried tweaking https://github.com/jrevels/Cassette.jl/blob/master/src/overdub.jl#L524 to

using SpecializeVarargs
@specialize_vararg 5 recurse(ctx::Context, ::typeof(Core._apply), f, args...) = Core._apply(recurse, (ctx, f), args...)

and I find that the allocations in

using AutoPreallocation, BenchmarkTools

foo() = ones(1, 2096) * ones(2096, 1024) * ones(1024,1)

let
    foo_res, foo_record = record_allocations(foo)
    @btime avoid_allocations($foo_record, $foo)
end

goes from 7 allocations: 192 bytes to 6 allocations: 176 bytes, which suggests I might be on the right track. I tried for a while to manually add more methods to overdub so it doesn't rely as heavily on varargs, but I couldn't figure out how to do it correctly.

MasonProtter avatar Feb 18 '20 18:02 MasonProtter

To clarify what I mean, I think that if overdub were defined with methods

overdub(OVERDUB_CONTEXT_NAME::Context)
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1) where {T1}
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1, arg2::T2) where {T1, T2}
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1, arg2::T2, arg3::T3) where {T1, T2, T3}
overdub(OVERDUB_CONTEXT_NAME::Context, arg1::T1, arg2::T2, arg3::T3, arg4::T4) where {T1, T2, T3, T4}

instead of just

overdub(OVERDUB_CONTEXT_NAME::Context, args...)

the allocations might be avoided here. The explicit specializations on ::T1, ... ::T4 are important here for forcing the compiler to infer things.

MasonProtter avatar Feb 18 '20 18:02 MasonProtter