UnROOT.jl
UnROOT.jl copied to clipboard
ERROR: TBasket not defined
The file contains two trees that look the same
f["Scaled"] # works
f["DecayTree"] # ERROR: zlib error: incorrect header check (code: -3)
Here is a hear of the stacktrace:
changemode!(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}, ::Symbol) at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:717
callprocess(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}, ::TranscodingStreams.Buffer, ::TranscodingStreams.Buffer) at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:649
fillbuffer(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}; eager::Bool) at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:577
fillbuffer at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:564
eof(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}) at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:188
readbytes!(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}, ::Array{UInt8,1}, ::Int32) at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:371
read(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}, ::Int32) at .\io.jl:941
datastream(::IOStream, ::UnROOT.TKey32) at .julia\packages\UnROOT\T4A6o\src\types.jl:108
UnROOT.TTree(::IOStream, ::UnROOT.TKey32, ::Dict{Int32,Any}) at .julia\packages\UnROOT\T4A6o\src\bootstrap.jl:679
getindex(::ROOTFile, ::SubString{String}) at .julia\packages\UnROOT\T4A6o\src\root.jl:98
getindex(::ROOTFile, ::String) at .julia\packages\UnROOT\T4A6o\src\root.jl:93
array(::ROOTFile, ::String; raw::Bool) at .julia\packages\UnROOT\T4A6o\src\root.jl:142
array at .julia\packages\UnROOT\T4A6o\src\root.jl:139
binlineshape(::String, ::String, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}) at c:\Users\mikha.julia\dev\OmegacDecay\script\feeddown\lineshape_from_saras_files.jl:14
I can read them with uproot.py
what can it be?
That's weird, it seems that the basket reading is somehow messed up. Is it possible to upload the file somewhere? I'll try to find some time to investigate...
it is 600Mb, are you willing to download :) ?
Yep sure ;)
https://cernbox.cern.ch/index.php/s/AzrPeo78d0bPGMA
f = ROOTFile(joinpath(pathto_folder, "Ob2XicKK_tree.root"))
f["DecayTree"] # ERROR: zlib error: incorrect header check (code: -3)
#
f = ROOTFile(joinpath(pathto_folder, "Ob2XicpKpi_tree.root"))
f["DecayTree"] # ERROR: UndefVarError: TBasket not defined
thanks!
Thanks! Do you happen to have also the other file with the zlib-error?
(edited) I misread the message first
No, I started with UnROOT
today. I will let you know if notice it with other files.
OK I see, so for now I will have a look at this TBasket
thing ;)
UnROOT is currently an experimental package and I made it work with files from our own experiment (with custom streamers), so don't expect too much. I also have little time currently, but I will try my best to fix trivial issues!
Now I see the second file "Ob2XicKK_tree.root" in your CERN box, it was not there before ;)
ah, perhaps, the cernbox took time to upload it
I have no real progress yet, I suspect that it might be something related to the compression library I use. I will do a side-by-side comparison with uproot
.
TKey says the uncompressed data is slightly longer than 2^24, reading first 2^22 seems okay. I wonder if it's because some kind of default chunking in Zlib. maybe we just need to find that keyword argument.
I know what's going on. The maximum of uncompressedbytes
is 0xffffff
, which is smaller than what TKey.fObjlen
reports, which probably means we need to automatically do it in multiple shots?
using the compressedbytes
and uncompressedbytes
defined here I'm able to decompress without crashing. by doing (see the PR below)
Which makes both of those files run into:
TBasket not defined
now
upon further investigation, uproot would unpack TBasket in-place when it runs into one so there's not really a type/struct we need to define. will investigate soon
update:
julia> r["DecayTree"]
position(io) = 627
ERROR: UndefVarError: TBasket not defined
Stacktrace:
julia> ds = UnROOT.datastream(i, tkey)
julia> seek(ds, 627)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=30304363, maxsize=Inf, ptr=628, mark=-1)
julia> UnROOT.unpack(ds, UnROOT.TBasketKey)
UnROOT.TBasketKey
fNbytes: Int32 0
fVersion: Int16 2
fObjlen: Int32 255922
fDatime: UInt32 0x9c421000
fKeylen: Int16 78
fCycle: Int16 0
fSeekKey: Int64 0
fSeekPdir: Int64 0
fClassName: String "TBasket"
fName: String "nEvent"
fTitle: String "DecayTree"
fBufferSize: Int32 256000
fNevBufSize: Int32 4
fNevBuf: Int32 42640
fLast: Int32 170638
it is indeed just a basket in the middle of no where...
OK, so it seems that we need to keep track of the TBaskets
when they appear out of nowhere (I am thinking about caching just the location and the key). I remember @jpivarski mentioned that these baskets can show up in different places but I don't find any useful information in my notes about it anymore. Maybe Jim can point us to some existing "docs" ;)
It's not the sort of thing that would be documented as such; it's an artifact of how TTrees get written when an error interrupts the write.
The completely different place is embedded within the TBranch—i.e. in the TTree object that contains all TBranches and TLeaves. A TBranch has a single TBasket attribute, which is the uncompressed TBasket that it was filling at the time when the writing process shut down. Under normal conditions, this TBasket is filled until it reaches its maximum capacity, then it's compressed and stored as an independent object, where we normally get TBasket data from.
I can point you to some Uproot code that deals with this specifically. (The ROOT code doesn't call it out as a separate thing because that embedded TBasket is part of the TBranch streamer.)
Ordinarily (when minimal_ttree_metadata=True
), Uproot skips the deserialization of embedded TBaskets because it adds a lot of time to processing files with thousands of TBranches:
https://github.com/scikit-hep/uproot4/blob/a0c117900bc3c365a8acda5949217d8135266ad3/src/uproot/models/TBranch.py#L494-L503
(I'm reminded by the code above that it's not just one TBasket: it's a TObjArray of embedded TBaskets.)
As a last stage of creating the TBranch (postprocess
), we determine whether the embedded TBaskets would ever be needed by seeing if the total number of entries in normal TBaskets add up to the TTree's number of entries. If the embedded TBaskets are needed, they aren't read yet (we don't know yet if the user is interested in this TBranch), but the _embedded_baskets
is set to None
, rather than an empty list, as a signal that they're needed (last else clause below).
https://github.com/scikit-hep/uproot4/blob/a0c117900bc3c365a8acda5949217d8135266ad3/src/uproot/behaviors/TBranch.py#L2694-L2733
The embedded TBasket(s) number(s) is taken to be after the last normal TBasket, since this is the last data that ROOT was working on when it died. If that's included in the user's entry range when they read a TBranch, then it will go through the embedded_baskets
property:
https://github.com/scikit-hep/uproot4/blob/a0c117900bc3c365a8acda5949217d8135266ad3/src/uproot/behaviors/TBranch.py#L2631-L2652
In addition to always being uncompressed, embedded TBaskets have a slightly different structure from free-standing TBaskets. This deserialization code shows the difference:
https://github.com/scikit-hep/uproot4/blob/a0c117900bc3c365a8acda5949217d8135266ad3/src/uproot/models/TBasket.py#L248-L317
Notes
I think in practice, I've never seen more than one embedded TBasket. How could there be? It's the last one the TTree was writing when it died. Nevertheless, it's a TObjArray of them, so I treat it everywhere like a list. (A list of one element vs a list of zero elements vs None is a useful way to distinguish different states of having already read it, not having any to read, and needing it but not yet having read it. So the list does come in handy.)
Uproot 3 code called this "recovery" and "recovered" TBaskets. My impression when I first encountered this was that it was obviously a corrupted file. But I've since learned that this is an intended feature, how it's supposed to work, so that failures during writing produce files that are nevertheless readable. For that reason, Uproot 4 calls them "embedded."
Uproot's writing does not have this feature: if it fails before writing a standalone (normal) TBasket, then that TBasket is simply unavailable. So Uproot's writing process always makes empty embedded TBaskets.