Resurrection of https://github.com/tamasgal/UnROOT.jl/pull/101 (which means some copy-pasting from @Moelf ;) ). Basically, we add a inplace::Bool to interped_data which toggles between doing ntoh.(reinterpret(...)) (copy) or an inplace-variant, both returning the same type which means it's still type stable.

But with a small modification, basketarray()'s return type depends on the boolean. Probably this can be fixed with a barrier function to discard the Ref{UInt8[]} if called with inplace=false (the default.

Unit tests all pass.

julia> using UnROOT

julia> const t = LazyTree(ROOTFile("uncompressed_Run2012BC_DoubleMuParked_Muons.root"),"Events");

julia> function bar(t)
           for evt in t
               _ = evt.Muon_pt
           end
           nothing
       end

before

julia> @btime sum(t.nMuon) seconds=10
  204.417 ms (6078 allocations: 469.98 MiB)
0x0000000008e67ad8

julia> @btime bar(t) seconds=10
  712.140 ms (14304 allocations: 1.82 GiB)

after

julia> @btime sum(t.nMuon) seconds=10
  205.316 ms (6780 allocations: 235.24 MiB)
0x0000000008e67ad8

julia> @btime bar(t) seconds=10
  725.626 ms (16241 allocations: 1.26 GiB)

Performance is pretty much the same but with less total allocated memory. If I profile bar(t) I see there's a materialization. I thought this should be in place??

Sep 18 '21 05:09 aminnj

Codecov Report

Merging #112 (c55c83a) into master (db89380) will increase coverage by 0.03%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #112      +/-   ##
==========================================
+ Coverage   90.40%   90.44%   +0.03%     
==========================================
  Files          11       11              
  Lines        1355     1360       +5     
==========================================
+ Hits         1225     1230       +5     
  Misses        130      130

Impacted Files	Coverage Δ
src/custom.jl	`98.70% <100.00%> (ø)`
src/iteration.jl	`88.97% <100.00%> (ø)`
src/root.jl	`92.97% <100.00%> (+0.14%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update db89380...c55c83a. Read the comment docs.

Sep 18 '21 05:09 codecov[bot]

I know it probably doesn't get much better but this is very messy :(

Sep 18 '21 13:09 Moelf

actually:

#before
(14304 allocations: 1.82 GiB)
#after
(16241 allocations: 1.26 GiB)

I think this means we shouldn't do this, the number of allocation increases probably caused slow down. And the difference isn't as big as the 2x in the nMuon case

Sep 18 '21 13:09 Moelf

I agree it’s not pretty. I’ll keep this an open wip for now since at least it works. If we get inspiration for making it better and truly in place (?) that would be great :)

Sep 18 '21 21:09 aminnj

I also think that this needs a bit more time ;) let's keep this floating around...

Sep 18 '21 22:09 tamasgal

Brainstorming a bit more. What about a thin wrapper type? Instead of returning a Ref{UInt8} everywhere and storing it in the LazyBranch to prevent the underlying data from unsafe_wrap from being GCed, we could define

struct Wrapper{T} <: AbstractVector{T}
           x::Vector{T}
           ref::Ref{Vector{UInt8}}
       end
Base.@propagate_inbounds Base.getindex(w::Wrapper{T}, ind::Int) where T = w.x[ind]
Base.size(w::Wrapper) = size(w.x)

And then with minimal changes, we eliminate the materialization associated with reinterpret.

     function LazyBranch(f::ROOTFile, b::Union{TBranch,TBranchElement})
         T, J = auto_T_JaggT(f, b; customstructs=f.customstructs)
         T = (T === Vector{Bool} ? BitVector : T)
-        _buffer = T[]
+        _buffer = Wrapper{eltype(T)}([], Ref(UInt8[]))
         if J != Nojagg

 function interped_data(rawdata, rawoffsets, ::Type{T}, ::Type{J}) where {T, J<:JaggType}
     if J === Nojagg
-        return ntoh.(reinterpret(T, rawdata))
+        p = convert(Ptr{eltype(T)}, pointer(rawdata))
+        w = unsafe_wrap(Array, p, length(rawdata) ÷ sizeof(eltype(T)))
+        w .= ntoh.(w)
+        return Wrapper(w, Ref(rawdata))

julia> @btime sum(tf.nMuon) # before
  306.325 ms (1794 allocations: 469.67 MiB)
julia> @btime sum(tf.nMuon) # after
  206.946 ms (1716 allocations: 234.91 MiB)

I think it's not too hard to generalize to VoV since it can wrap any AbstractVector. I.e.,

julia> using ArraysOfArrays
julia> Wrapper([1,2,3,4],Ref(UInt8[]));
julia> VectorOfVectors(w, [1,2,5])
2-element VectorOfVectors{Int64, UnROOT.Wrapper{Int64}, Vector{Int64}, Vector{Tuple{}}}:
 [1]
 [2, 3, 4]

Oct 18 '21 01:10 aminnj

I wonder if @oschulz has any opinion. In fact, it would be nice to have a variation of VectorOfVector that has a field for Ref to hold the original byte array...

Dec 02 '21 17:12 Moelf

what happens if we make our own VoV <:

https://github.com/JuliaArrays/ArraysOfArrays.jl/blob/393a25de3796ae5dbf69760edcb659dc984ff0f5/src/array_of_similar_arrays.jl#L18

Dec 02 '21 17:12 Moelf

I'm not sure I've understood all the implications here - are you looking for somethings like a VectorOfVectors wrapped around an UnsafeArray?

Dec 02 '21 18:12 oschulz

the problem is we have an original byte array A::Vector{UInt8}, we want to wrap VoV around it except we want to "cast" it to, for example, B::Vector{Int32}, but if you cast it by unsafe_wrap(), GC thinks A is not needed and thus GC-ed and you get bad data in VoV(B).

Dec 02 '21 20:12 Moelf

But why can't we use ReinterpretArray if we have to keep the reference to A around anyway?

Dec 02 '21 22:12 oschulz

because ReinterpretArray is slow for both:

A .= ntoh.(A) (10x slower)
everything user will later on do, such as looping over the branch
just generally ugly/annoying to deal with ReinterpretArray considering we already have many wrappers

Dec 02 '21 22:12 Moelf

A .= ntoh.(A) (10x slower)

I wonder why ReinterpretArray is performing that much worse, compared to our custom wrappers. Well, probably can't be helped right now.

In that case I would recommend to use a VectorOfVectors around a custom wrapper - would that be workable? I don't think creating a special VoV type would increase performance, and it would generate more code to maintain long term.

Dec 03 '21 08:12 oschulz

https://github.com/JuliaLang/julia/issues/42227#issuecomment-1037463738

on Julia master, y .= ntoh.(y) is no longer slow given y is an ReinterpretedArray

Feb 12 '22 20:02 Moelf

sadly despite the reduced allocation, it doesn't seem to improve timing by a lot in microbenchmark

Feb 12 '22 21:02 Moelf

It might be that zlib decompression + the calculation is washing out the true benchmark

Feb 13 '22 02:02 aminnj

UnROOT.jl
UnROOT.jl copied to clipboard

[WIP] resurrecting in place `ntoh`

before

after

Codecov Report

UnROOT.jl UnROOT.jl copied to clipboard

[WIP] resurrecting in place `ntoh`

before

after

Codecov Report

UnROOT.jl
UnROOT.jl copied to clipboard