OffsetArrays.jl
OffsetArrays.jl copied to clipboard
Significantly slower broadcasting
I have been testing the speed of broadcast operations with OffsetArrays of StaticArrays, and it looks like there is a significant penalty in time. In fact, on my laptop I see
julia> using StaticArrays, OffsetArrays, BenchmarkTools
julia> xs = [SVector{2}(rand(2)) for _ ∈ 1:10_000];
julia> ys = similar(xs);
julia> @benchmark (@. $ys = $xs / (first($xs)^2 + last($xs)^2))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 9.419 μs … 43.605 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 10.142 μs ┊ GC (median): 0.00%
Time (mean ± σ): 11.559 μs ± 2.524 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▁ ▅
█▅▆▄▆█▃█▂▃▂▂▃▂▂▃▃▂▂▃▂▁▂▂▂▂▂▂▂▂▃▂▁▂▂▁▁▁▁▁▃▄▅▂▂▂▄▂▃▁▃▂█▆▂▁▁▅▆ ▃
9.42 μs Histogram: frequency by time 15.2 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
When using OffsetArrays instead these are the results:
julia> oxs = OffsetArray(xs, -1000);
julia> oys = OffsetArray(ys, -1000);
julia> @benchmark (@. $oys = $oxs / (first($oxs)^2 + last($oxs)^2))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 48.245 μs … 159.888 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 50.522 μs ┊ GC (median): 0.00%
Time (mean ± σ): 53.394 μs ± 10.011 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▇███▄▁ ▃ ▃ ▂ ▁ ▄ ▃ ▂
██████▇▇▁▇▆▆▇▅▇▃█▆█▇██▆█▇▆█▅█▆▅█▆▆▅█▆▆▆▆▆▆█▆▆▆▅▆▆▅▆▆▅▆▆▆▆▆▆▆ █
48.2 μs Histogram: log(frequency) by time 97.1 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
The $> 5 \times$ penalty is even larger than the SIMD vector size of my laptop (4 for Float64
arrays).
Any help or clarification is really appreciated!