StaticArrays.jl icon indicating copy to clipboard operation
StaticArrays.jl copied to clipboard

Allocations in broadcast of broadcast

Open pablosanjose opened this issue 2 years ago • 7 comments

AFAIU this should not allocate

julia> using BenchmarkTools, StaticArrays

julia> test(f) = f .* f'
test (generic function with 1 method)

julia> v = SA[1, 0]; vs = SVector(v, v);

julia> @btime test.($vs);
  278.802 ns (8 allocations: 416 bytes)

Replacing vs = SVector(v, v) with vs = (v,v) makes the allocations go away.

Tried downgrading to [email protected], still the same. Also down to Julia 1.7.

Making the MWE only slightly more involved also makes the allocations (weirdly) disappear simply by rerunning the definition of the function containing the broadcast, see discussion in https://discourse.julialang.org/t/broadcasting-heisen-allocations/101572/2

pablosanjose avatar Jul 13 '23 14:07 pablosanjose

Ideally this shouldn't allocate but broadcasting code is very demanding on the compiler, even more so nested broadcasting. I'd suggest just using map instead of the outer broadcast.

mateuszbaran avatar Jul 13 '23 17:07 mateuszbaran

In the snippet above I can replace test.(vs) with a map, but I'm afraid that is only true for simple types of broadcasts. What would I do if I have this pattern (as I do in actual work)?

using BenchmarkTools, StaticArrays
test(i, j, vs) = vs[i] .* vs[j]'
i = SA[1, 2, 3]; v = SA[2, 3, 4]; vs =(v, 2v, 3v);
julia> @btime test.($i, $(i'), Ref($vs));
  608.282 ns (8 allocations: 1.58 KiB)

pablosanjose avatar Jul 13 '23 17:07 pablosanjose

For the more general broadcast case above, I currently resort to the following workaround (i.e. transform the outer broadcast into a tuple broadcast and back to SMatrix), but it is quite ugly and hard to understand

using BenchmarkTools, StaticArrays
test((i, j), vs) = vs[i] .* vs[j]'
test2(i, vs) = SMatrix{3,3}(test.(Tuple(tuple.(i, i')), Ref(vs)))
i = SA[1, 2, 3]; v = SA[2, 3, 4]; vs = (v, 2v, 3v);
julia> @btime test2($i, $vs);
  29.271 ns (0 allocations: 0 bytes)

The key here is that broadcast over tuples does not suffer from this problem. It only happens for broadcast over SArrays. Not sure if that's helpful though.

pablosanjose avatar Jul 13 '23 17:07 pablosanjose

I see. It would be great if someone could improve nested broadcasting but unfortunately that problem is too hard for me. Maybe @N5N3 could take a look?

mateuszbaran avatar Jul 14 '23 09:07 mateuszbaran

I didn't check but this looks like a recursion limitation. Since it's signature-based, thus Tuple of SArray is ok as they have different BroadcastStyles. It's possible to turn off the recursion check, but I don't think it's a good idea to try it here. There's an upstream PR for this kind of problem (https://github.com/JuliaLang/julia/pull/48059)

N5N3 avatar Jul 14 '23 12:07 N5N3

I see. I guess there is no standard workaround for the recursion limit? Some time ago I tried defining multiple methods with the same body to trick it but that approach wasn't particularly appreciated (though I worked is some cases).

mateuszbaran avatar Jul 14 '23 13:07 mateuszbaran

The current "standard" workaround is refactoring the function's inputs and make it shrinking during recursion (fewer arguments, simpler type, etc). Multiple methods could also alleviate the problem, but it might still be limited once we recursion back. (a->b->c->a) Anyway, these solutions only work for functions designed to be self-recursive.

N5N3 avatar Jul 14 '23 13:07 N5N3