UnROOT.jl icon indicating copy to clipboard operation
UnROOT.jl copied to clipboard

[WIP] resurrecting in place `ntoh`

Open aminnj opened this issue 3 years ago • 16 comments

Resurrection of https://github.com/tamasgal/UnROOT.jl/pull/101 (which means some copy-pasting from @Moelf ;) ). Basically, we add a inplace::Bool to interped_data which toggles between doing ntoh.(reinterpret(...)) (copy) or an inplace-variant, both returning the same type which means it's still type stable.

But with a small modification, basketarray()'s return type depends on the boolean. Probably this can be fixed with a barrier function to discard the Ref{UInt8[]} if called with inplace=false (the default.

Unit tests all pass.

julia> using UnROOT

julia> const t = LazyTree(ROOTFile("uncompressed_Run2012BC_DoubleMuParked_Muons.root"),"Events");

julia> function bar(t)
           for evt in t
               _ = evt.Muon_pt
           end
           nothing
       end

before

julia> @btime sum(t.nMuon) seconds=10
  204.417 ms (6078 allocations: 469.98 MiB)
0x0000000008e67ad8

julia> @btime bar(t) seconds=10
  712.140 ms (14304 allocations: 1.82 GiB)

after

julia> @btime sum(t.nMuon) seconds=10
  205.316 ms (6780 allocations: 235.24 MiB)
0x0000000008e67ad8

julia> @btime bar(t) seconds=10
  725.626 ms (16241 allocations: 1.26 GiB)

Performance is pretty much the same but with less total allocated memory. If I profile bar(t) I see there's a materialization. I thought this should be in place??

image

aminnj avatar Sep 18 '21 05:09 aminnj

Codecov Report

Merging #112 (c55c83a) into master (db89380) will increase coverage by 0.03%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #112      +/-   ##
==========================================
+ Coverage   90.40%   90.44%   +0.03%     
==========================================
  Files          11       11              
  Lines        1355     1360       +5     
==========================================
+ Hits         1225     1230       +5     
  Misses        130      130              
Impacted Files Coverage Δ
src/custom.jl 98.70% <100.00%> (ø)
src/iteration.jl 88.97% <100.00%> (ø)
src/root.jl 92.97% <100.00%> (+0.14%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update db89380...c55c83a. Read the comment docs.

codecov[bot] avatar Sep 18 '21 05:09 codecov[bot]

I know it probably doesn't get much better but this is very messy :(

Moelf avatar Sep 18 '21 13:09 Moelf

actually:

#before
(14304 allocations: 1.82 GiB)
#after
(16241 allocations: 1.26 GiB)

I think this means we shouldn't do this, the number of allocation increases probably caused slow down. And the difference isn't as big as the 2x in the nMuon case

Moelf avatar Sep 18 '21 13:09 Moelf

I agree it’s not pretty. I’ll keep this an open wip for now since at least it works. If we get inspiration for making it better and truly in place (?) that would be great :)

aminnj avatar Sep 18 '21 21:09 aminnj

I also think that this needs a bit more time ;) let's keep this floating around...

tamasgal avatar Sep 18 '21 22:09 tamasgal

Brainstorming a bit more. What about a thin wrapper type? Instead of returning a Ref{UInt8} everywhere and storing it in the LazyBranch to prevent the underlying data from unsafe_wrap from being GCed, we could define

struct Wrapper{T} <: AbstractVector{T}
           x::Vector{T}
           ref::Ref{Vector{UInt8}}
       end
Base.@propagate_inbounds Base.getindex(w::Wrapper{T}, ind::Int) where T = w.x[ind]
Base.size(w::Wrapper) = size(w.x)

And then with minimal changes, we eliminate the materialization associated with reinterpret.

     function LazyBranch(f::ROOTFile, b::Union{TBranch,TBranchElement})
         T, J = auto_T_JaggT(f, b; customstructs=f.customstructs)
         T = (T === Vector{Bool} ? BitVector : T)
-        _buffer = T[]
+        _buffer = Wrapper{eltype(T)}([], Ref(UInt8[]))
         if J != Nojagg
 function interped_data(rawdata, rawoffsets, ::Type{T}, ::Type{J}) where {T, J<:JaggType}
     if J === Nojagg
-        return ntoh.(reinterpret(T, rawdata))
+        p = convert(Ptr{eltype(T)}, pointer(rawdata))
+        w = unsafe_wrap(Array, p, length(rawdata) ÷ sizeof(eltype(T)))
+        w .= ntoh.(w)
+        return Wrapper(w, Ref(rawdata))
julia> @btime sum(tf.nMuon) # before
  306.325 ms (1794 allocations: 469.67 MiB)
julia> @btime sum(tf.nMuon) # after
  206.946 ms (1716 allocations: 234.91 MiB)

I think it's not too hard to generalize to VoV since it can wrap any AbstractVector. I.e.,

julia> using ArraysOfArrays
julia> Wrapper([1,2,3,4],Ref(UInt8[]));
julia> VectorOfVectors(w, [1,2,5])
2-element VectorOfVectors{Int64, UnROOT.Wrapper{Int64}, Vector{Int64}, Vector{Tuple{}}}:
 [1]
 [2, 3, 4]

aminnj avatar Oct 18 '21 01:10 aminnj

I wonder if @oschulz has any opinion. In fact, it would be nice to have a variation of VectorOfVector that has a field for Ref to hold the original byte array...

Moelf avatar Dec 02 '21 17:12 Moelf

what happens if we make our own VoV <:

https://github.com/JuliaArrays/ArraysOfArrays.jl/blob/393a25de3796ae5dbf69760edcb659dc984ff0f5/src/array_of_similar_arrays.jl#L18

Moelf avatar Dec 02 '21 17:12 Moelf

I'm not sure I've understood all the implications here - are you looking for somethings like a VectorOfVectors wrapped around an UnsafeArray?

oschulz avatar Dec 02 '21 18:12 oschulz

the problem is we have an original byte array A::Vector{UInt8}, we want to wrap VoV around it except we want to "cast" it to, for example, B::Vector{Int32}, but if you cast it by unsafe_wrap(), GC thinks A is not needed and thus GC-ed and you get bad data in VoV(B).

Moelf avatar Dec 02 '21 20:12 Moelf

But why can't we use ReinterpretArray if we have to keep the reference to A around anyway?

oschulz avatar Dec 02 '21 22:12 oschulz

because ReinterpretArray is slow for both:

  1. A .= ntoh.(A) (10x slower)
  2. everything user will later on do, such as looping over the branch
  3. just generally ugly/annoying to deal with ReinterpretArray considering we already have many wrappers

Moelf avatar Dec 02 '21 22:12 Moelf

A .= ntoh.(A) (10x slower)

I wonder why ReinterpretArray is performing that much worse, compared to our custom wrappers. Well, probably can't be helped right now.

In that case I would recommend to use a VectorOfVectors around a custom wrapper - would that be workable? I don't think creating a special VoV type would increase performance, and it would generate more code to maintain long term.

oschulz avatar Dec 03 '21 08:12 oschulz

https://github.com/JuliaLang/julia/issues/42227#issuecomment-1037463738

on Julia master, y .= ntoh.(y) is no longer slow given y is an ReinterpretedArray

Moelf avatar Feb 12 '22 20:02 Moelf

image

sadly despite the reduced allocation, it doesn't seem to improve timing by a lot in microbenchmark

Moelf avatar Feb 12 '22 21:02 Moelf

It might be that zlib decompression + the calculation is washing out the true benchmark

aminnj avatar Feb 13 '22 02:02 aminnj