UnROOT.jl icon indicating copy to clipboard operation
UnROOT.jl copied to clipboard

Micro optimization for offset array

Open Moelf opened this issue 2 years ago • 1 comments

often when we go from 0-index to 1-index and use ArrayOfArrays, we have some patterns like this: https://github.com/JuliaHEP/UnROOT.jl/blob/dfce0e45f295b82d8af9c2c904a6a15ceaff8b7c/src/RNTuple/fieldcolumn_reading.jl#L107-L115

I can think of 3 ways of doing this assuming we will make a copy (another option is to plug offset::Base.ReinterpretArray into VectorOfVectors but idk if that's a good idea):

julia> function f(ary, o)
           return pushfirst!(ary .+ o, o)
       end

julia> function g(ary, o)
           return append!([o], ary .+= o)
       end

julia> function h(ary, o)
           res = append!([o], ary)
           @views res[2:end] .+= o
           return res
       end

julia> @benchmark f(ary,o) setup = begin ary = reinterpret(Int32, rand(UInt8, 10^5*4)); o = Int32(1) end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  37.401 μs … 672.151 μs  ┊ GC (min … max): 0.00% … 88.17%
 Time  (median):     49.064 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   53.250 μs ±  43.186 μs  ┊ GC (mean ± σ):  6.11% ±  6.97%

    ▃▅▅▆█▇▅▃▃▂▂▂▁                                              ▂
  ▃▄██████████████▇▇▆▅▅▄▃▃▃▄▃▁▁▄▁▁▁▃▁▃▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▃▅▄▃▃▄▆▆ █
  37.4 μs       Histogram: log(frequency) by time       141 μs <

 Memory estimate: 695.55 KiB, allocs estimate: 3.

julia> @benchmark g(ary,o) setup = begin ary = reinterpret(Int32, rand(UInt8, 10^5*4)); o = Int32(1) end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  55.837 μs … 692.319 μs  ┊ GC (min … max): 0.00% … 75.43%
 Time  (median):     62.729 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   65.075 μs ±  28.550 μs  ┊ GC (mean ± σ):  2.06% ±  4.30%

          ▁▃▄▅▆▇███▇▆▄▂▂▂▂▂▁▂▁▁▁ ▁▁▁▂▁                         ▃
  ▃▁▁▃▁▃▅▇███████████████████████████████▇▇▇▆▇▆▆▅▇▆▇▆▇▆▅▅▆▅▄▅▆ █
  55.8 μs       Histogram: log(frequency) by time      81.9 μs <

 Memory estimate: 390.89 KiB, allocs estimate: 6.

julia> @benchmark h(ary,o) setup = begin ary = reinterpret(Int32, rand(UInt8, 10^5*4)); o = Int32(1) end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  27.733 μs … 140.297 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     34.085 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   35.711 μs ±   6.306 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▃▇█▅▁
  ▂▂▁▂▂▂▃▃▄▅██████▆▅▅▄▄▄▄▄▄▃▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  27.7 μs         Histogram: frequency by time         54.6 μs <

 Memory estimate: 390.81 KiB, allocs estimate: 2.

Moelf avatar Feb 26 '23 16:02 Moelf

tbh, I'm not sure why h is better than g

edit, probably:

  • https://github.com/JuliaLang/julia/issues/48801

Moelf avatar Feb 26 '23 16:02 Moelf