UnROOT.jl
UnROOT.jl copied to clipboard
[WIP] resurrecting in place `ntoh`
Resurrection of https://github.com/tamasgal/UnROOT.jl/pull/101 (which means some copy-pasting from @Moelf ;) ). Basically, we add a inplace::Bool
to interped_data
which toggles between doing ntoh.(reinterpret(...))
(copy) or an inplace-variant, both returning the same type which means it's still type stable.
But with a small modification, basketarray()
's return type depends on the boolean. Probably this can be fixed with a barrier function to discard the Ref{UInt8[]}
if called with inplace=false
(the default.
Unit tests all pass.
julia> using UnROOT
julia> const t = LazyTree(ROOTFile("uncompressed_Run2012BC_DoubleMuParked_Muons.root"),"Events");
julia> function bar(t)
for evt in t
_ = evt.Muon_pt
end
nothing
end
before
julia> @btime sum(t.nMuon) seconds=10
204.417 ms (6078 allocations: 469.98 MiB)
0x0000000008e67ad8
julia> @btime bar(t) seconds=10
712.140 ms (14304 allocations: 1.82 GiB)
after
julia> @btime sum(t.nMuon) seconds=10
205.316 ms (6780 allocations: 235.24 MiB)
0x0000000008e67ad8
julia> @btime bar(t) seconds=10
725.626 ms (16241 allocations: 1.26 GiB)
Performance is pretty much the same but with less total allocated memory. If I profile bar(t)
I see there's a materialization. I thought this should be in place??
Codecov Report
Merging #112 (c55c83a) into master (db89380) will increase coverage by
0.03%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #112 +/- ##
==========================================
+ Coverage 90.40% 90.44% +0.03%
==========================================
Files 11 11
Lines 1355 1360 +5
==========================================
+ Hits 1225 1230 +5
Misses 130 130
Impacted Files | Coverage Δ | |
---|---|---|
src/custom.jl | 98.70% <100.00%> (ø) |
|
src/iteration.jl | 88.97% <100.00%> (ø) |
|
src/root.jl | 92.97% <100.00%> (+0.14%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update db89380...c55c83a. Read the comment docs.
I know it probably doesn't get much better but this is very messy :(
actually:
#before
(14304 allocations: 1.82 GiB)
#after
(16241 allocations: 1.26 GiB)
I think this means we shouldn't do this, the number of allocation increases probably caused slow down. And the difference isn't as big as the 2x in the nMuon
case
I agree it’s not pretty. I’ll keep this an open wip for now since at least it works. If we get inspiration for making it better and truly in place (?) that would be great :)
I also think that this needs a bit more time ;)
let's keep this floating around...
Brainstorming a bit more. What about a thin wrapper type? Instead of returning a Ref{UInt8} everywhere and storing it in the LazyBranch to prevent the underlying data from unsafe_wrap
from being GCed, we could define
struct Wrapper{T} <: AbstractVector{T}
x::Vector{T}
ref::Ref{Vector{UInt8}}
end
Base.@propagate_inbounds Base.getindex(w::Wrapper{T}, ind::Int) where T = w.x[ind]
Base.size(w::Wrapper) = size(w.x)
And then with minimal changes, we eliminate the materialization associated with reinterpret.
function LazyBranch(f::ROOTFile, b::Union{TBranch,TBranchElement})
T, J = auto_T_JaggT(f, b; customstructs=f.customstructs)
T = (T === Vector{Bool} ? BitVector : T)
- _buffer = T[]
+ _buffer = Wrapper{eltype(T)}([], Ref(UInt8[]))
if J != Nojagg
function interped_data(rawdata, rawoffsets, ::Type{T}, ::Type{J}) where {T, J<:JaggType}
if J === Nojagg
- return ntoh.(reinterpret(T, rawdata))
+ p = convert(Ptr{eltype(T)}, pointer(rawdata))
+ w = unsafe_wrap(Array, p, length(rawdata) ÷ sizeof(eltype(T)))
+ w .= ntoh.(w)
+ return Wrapper(w, Ref(rawdata))
julia> @btime sum(tf.nMuon) # before
306.325 ms (1794 allocations: 469.67 MiB)
julia> @btime sum(tf.nMuon) # after
206.946 ms (1716 allocations: 234.91 MiB)
I think it's not too hard to generalize to VoV since it can wrap any AbstractVector. I.e.,
julia> using ArraysOfArrays
julia> Wrapper([1,2,3,4],Ref(UInt8[]));
julia> VectorOfVectors(w, [1,2,5])
2-element VectorOfVectors{Int64, UnROOT.Wrapper{Int64}, Vector{Int64}, Vector{Tuple{}}}:
[1]
[2, 3, 4]
I wonder if @oschulz has any opinion. In fact, it would be nice to have a variation of VectorOfVector
that has a field for Ref
to hold the original byte array...
what happens if we make our own VoV <:
https://github.com/JuliaArrays/ArraysOfArrays.jl/blob/393a25de3796ae5dbf69760edcb659dc984ff0f5/src/array_of_similar_arrays.jl#L18
I'm not sure I've understood all the implications here - are you looking for somethings like a VectorOfVectors
wrapped around an UnsafeArray
?
the problem is we have an original byte array A::Vector{UInt8}
, we want to wrap VoV
around it except we want to "cast" it to, for example, B::Vector{Int32}
, but if you cast it by unsafe_wrap()
, GC thinks A
is not needed and thus GC-ed and you get bad data in VoV(B)
.
But why can't we use ReinterpretArray
if we have to keep the reference to A
around anyway?
because ReinterpretArray
is slow for both:
-
A .= ntoh.(A)
(10x slower) - everything user will later on do, such as looping over the branch
- just generally ugly/annoying to deal with
ReinterpretArray
considering we already have many wrappers
A .= ntoh.(A) (10x slower)
I wonder why ReinterpretArray is performing that much worse, compared to our custom wrappers. Well, probably can't be helped right now.
In that case I would recommend to use a VectorOfVectors
around a custom wrapper - would that be workable? I don't think creating a special VoV type would increase performance, and it would generate more code to maintain long term.
https://github.com/JuliaLang/julia/issues/42227#issuecomment-1037463738
on Julia master, y .= ntoh.(y)
is no longer slow given y
is an ReinterpretedArray
sadly despite the reduced allocation, it doesn't seem to improve timing by a lot in microbenchmark
It might be that zlib decompression + the calculation is washing out the true benchmark