UnROOT.jl
UnROOT.jl copied to clipboard
Micro optimization for offset array
often when we go from 0-index to 1-index and use ArrayOfArrays, we have some patterns like this: https://github.com/JuliaHEP/UnROOT.jl/blob/dfce0e45f295b82d8af9c2c904a6a15ceaff8b7c/src/RNTuple/fieldcolumn_reading.jl#L107-L115
I can think of 3 ways of doing this assuming we will make a copy (another option is to plug offset::Base.ReinterpretArray into VectorOfVectors but idk if that's a good idea):
julia> function f(ary, o)
return pushfirst!(ary .+ o, o)
end
julia> function g(ary, o)
return append!([o], ary .+= o)
end
julia> function h(ary, o)
res = append!([o], ary)
@views res[2:end] .+= o
return res
end
julia> @benchmark f(ary,o) setup = begin ary = reinterpret(Int32, rand(UInt8, 10^5*4)); o = Int32(1) end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 37.401 μs … 672.151 μs ┊ GC (min … max): 0.00% … 88.17%
Time (median): 49.064 μs ┊ GC (median): 0.00%
Time (mean ± σ): 53.250 μs ± 43.186 μs ┊ GC (mean ± σ): 6.11% ± 6.97%
▃▅▅▆█▇▅▃▃▂▂▂▁ ▂
▃▄██████████████▇▇▆▅▅▄▃▃▃▄▃▁▁▄▁▁▁▃▁▃▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▃▅▄▃▃▄▆▆ █
37.4 μs Histogram: log(frequency) by time 141 μs <
Memory estimate: 695.55 KiB, allocs estimate: 3.
julia> @benchmark g(ary,o) setup = begin ary = reinterpret(Int32, rand(UInt8, 10^5*4)); o = Int32(1) end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 55.837 μs … 692.319 μs ┊ GC (min … max): 0.00% … 75.43%
Time (median): 62.729 μs ┊ GC (median): 0.00%
Time (mean ± σ): 65.075 μs ± 28.550 μs ┊ GC (mean ± σ): 2.06% ± 4.30%
▁▃▄▅▆▇███▇▆▄▂▂▂▂▂▁▂▁▁▁ ▁▁▁▂▁ ▃
▃▁▁▃▁▃▅▇███████████████████████████████▇▇▇▆▇▆▆▅▇▆▇▆▇▆▅▅▆▅▄▅▆ █
55.8 μs Histogram: log(frequency) by time 81.9 μs <
Memory estimate: 390.89 KiB, allocs estimate: 6.
julia> @benchmark h(ary,o) setup = begin ary = reinterpret(Int32, rand(UInt8, 10^5*4)); o = Int32(1) end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 27.733 μs … 140.297 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 34.085 μs ┊ GC (median): 0.00%
Time (mean ± σ): 35.711 μs ± 6.306 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▇█▅▁
▂▂▁▂▂▂▃▃▄▅██████▆▅▅▄▄▄▄▄▄▃▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
27.7 μs Histogram: frequency by time 54.6 μs <
Memory estimate: 390.81 KiB, allocs estimate: 2.
tbh, I'm not sure why h is better than g
edit, probably:
- https://github.com/JuliaLang/julia/issues/48801