OffsetArrays.jl icon indicating copy to clipboard operation
OffsetArrays.jl copied to clipboard

Significantly slower broadcasting

Open astrozot opened this issue 6 months ago • 3 comments

I have been testing the speed of broadcast operations with OffsetArrays of StaticArrays, and it looks like there is a significant penalty in time. In fact, on my laptop I see

julia> using StaticArrays, OffsetArrays, BenchmarkTools

julia> xs = [SVector{2}(rand(2)) for _ ∈ 1:10_000];

julia> ys = similar(xs);

julia> @benchmark (@. $ys = $xs / (first($xs)^2 + last($xs)^2))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   9.419 μs … 43.605 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.142 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.559 μs ±  2.524 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █    ▁ ▅                                                     
  █▅▆▄▆█▃█▂▃▂▂▃▂▂▃▃▂▂▃▂▁▂▂▂▂▂▂▂▂▃▂▁▂▂▁▁▁▁▁▃▄▅▂▂▂▄▂▃▁▃▂█▆▂▁▁▅▆ ▃
  9.42 μs         Histogram: frequency by time        15.2 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

When using OffsetArrays instead these are the results:

julia> oxs = OffsetArray(xs, -1000);

julia> oys = OffsetArray(ys, -1000);

julia> @benchmark (@. $oys = $oxs / (first($oxs)^2 + last($oxs)^2))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  48.245 μs … 159.888 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     50.522 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   53.394 μs ±  10.011 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇███▄▁            ▃  ▃ ▂    ▁  ▄   ▃                         ▂
  ██████▇▇▁▇▆▆▇▅▇▃█▆█▇██▆█▇▆█▅█▆▅█▆▆▅█▆▆▆▆▆▆█▆▆▆▅▆▆▅▆▆▅▆▆▆▆▆▆▆ █
  48.2 μs       Histogram: log(frequency) by time      97.1 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

The $> 5 \times$ penalty is even larger than the SIMD vector size of my laptop (4 for Float64 arrays).

Any help or clarification is really appreciated!

astrozot avatar Aug 04 '24 16:08 astrozot