StaticArrays.jl
StaticArrays.jl copied to clipboard
Allocations in broadcast of broadcast
AFAIU this should not allocate
julia> using BenchmarkTools, StaticArrays
julia> test(f) = f .* f'
test (generic function with 1 method)
julia> v = SA[1, 0]; vs = SVector(v, v);
julia> @btime test.($vs);
278.802 ns (8 allocations: 416 bytes)
Replacing vs = SVector(v, v) with vs = (v,v) makes the allocations go away.
Tried downgrading to [email protected], still the same. Also down to Julia 1.7.
Making the MWE only slightly more involved also makes the allocations (weirdly) disappear simply by rerunning the definition of the function containing the broadcast, see discussion in https://discourse.julialang.org/t/broadcasting-heisen-allocations/101572/2
Ideally this shouldn't allocate but broadcasting code is very demanding on the compiler, even more so nested broadcasting. I'd suggest just using map instead of the outer broadcast.
In the snippet above I can replace test.(vs) with a map, but I'm afraid that is only true for simple types of broadcasts. What would I do if I have this pattern (as I do in actual work)?
using BenchmarkTools, StaticArrays
test(i, j, vs) = vs[i] .* vs[j]'
i = SA[1, 2, 3]; v = SA[2, 3, 4]; vs =(v, 2v, 3v);
julia> @btime test.($i, $(i'), Ref($vs));
608.282 ns (8 allocations: 1.58 KiB)
For the more general broadcast case above, I currently resort to the following workaround (i.e. transform the outer broadcast into a tuple broadcast and back to SMatrix), but it is quite ugly and hard to understand
using BenchmarkTools, StaticArrays
test((i, j), vs) = vs[i] .* vs[j]'
test2(i, vs) = SMatrix{3,3}(test.(Tuple(tuple.(i, i')), Ref(vs)))
i = SA[1, 2, 3]; v = SA[2, 3, 4]; vs = (v, 2v, 3v);
julia> @btime test2($i, $vs);
29.271 ns (0 allocations: 0 bytes)
The key here is that broadcast over tuples does not suffer from this problem. It only happens for broadcast over SArrays. Not sure if that's helpful though.
I see. It would be great if someone could improve nested broadcasting but unfortunately that problem is too hard for me. Maybe @N5N3 could take a look?
I didn't check but this looks like a recursion limitation.
Since it's signature-based, thus Tuple of SArray is ok as they have different BroadcastStyles.
It's possible to turn off the recursion check, but I don't think it's a good idea to try it here.
There's an upstream PR for this kind of problem (https://github.com/JuliaLang/julia/pull/48059)
I see. I guess there is no standard workaround for the recursion limit? Some time ago I tried defining multiple methods with the same body to trick it but that approach wasn't particularly appreciated (though I worked is some cases).
The current "standard" workaround is refactoring the function's inputs and make it shrinking during recursion (fewer arguments, simpler type, etc). Multiple methods could also alleviate the problem, but it might still be limited once we recursion back. (a->b->c->a) Anyway, these solutions only work for functions designed to be self-recursive.