UnROOT.jl icon indicating copy to clipboard operation
UnROOT.jl copied to clipboard

Support for custom branches that contain std vectors of custom structs?

Open oschulz opened this issue 2 years ago โ€ข 13 comments

I have a root file with a custom-type branch whose elements contain structs that contain vectors of custom structs, e.g.

struct Foo
{
  long a;
  std::vector<short> b;
};

struct Bar
{
  long c;
  std::vector<Foo> d;
};

The file uses standard ROOT autogenerated streamers. I'm trying to read it using

struct Foo
    a::Clong
    b::Vector{Cshort}
end

struct Bar
    d::Clong
    e::Vector{Foo}
end

f = ROOTFile("myfile.root", customstructs = Dict("Foo" => Foo, "Bar" => Bar))
tree = LazyTree(f, "TreeOnFire", ["bar_branch"]);
tree[1].bar_branch; # fails

but I get

julia> tree[1].bar_branch;
ERROR: MethodError: no method matching -(::Nothing, ::Int64)
[...]
Stacktrace:
 [1] _localindex_newbasket!(ba::LazyBranch{Plane, UnROOT.Nojagg, Vector{Plane}}, idx::Int64, tid::Int64)
[...]

Should this work or can't we handle custom structs like that automatically yet?

oschulz avatar Dec 01 '22 09:12 oschulz

This can be a bit tricky. We don't have much (read: any) automatisation yet for custom stuff. There are different ways of doing it, maybe you check out how I do it for some of the KM3NeT datastructures here (we included that into UnROOT and its test suite for documentation purposes): https://github.com/JuliaHEP/UnROOT.jl/blob/master/test/runtests.jl#L439

The parsing action is defined here: https://github.com/JuliaHEP/UnROOT.jl/blob/master/src/custom.jl#L145

As you can see, it might require some manual bit-hopping. If you can provide a sample data, I can help you out.

tamasgal avatar Dec 01 '22 10:12 tamasgal

Thanks @tamasgal, much appreciated! Adding a bit of bit-mangling code shouldn't be a problem. So I basically implement readtype and interped_data for the custom types, right? How do I read/iterate over std::vector in those?

oschulz avatar Dec 01 '22 13:12 oschulz

For the std::vector, you need to skip the magical 10 bytes at the beginning and then use the UnROOT.readtype(io, Cshort) function. It's similar to read() but changes the byte order (ROOT is big endian). It might need some trial and error, let me know if you need further help, but I think it should be fairly straight forward. ;)

tamasgal avatar Dec 01 '22 14:12 tamasgal

the documentation is between https://juliahep.github.io/UnROOT.jl/dev/advanced/custom_branch/ and the src/custom.jl

basically, you want to implement a function

function interped_data(rawdata, rawoffsets, ::Type{Vector{LVF64}}, ::Type{Offsetjagg})

but with your own type instead of LVF64

Moelf avatar Dec 01 '22 16:12 Moelf

I am having some trouble figuring this out. Could someone help?

I basically have a std::vector<std::vector<int>> in a root tree I'm trying to read, which, I didn't think would be too bad since the TLorentzVector is a a vector of 4-vectors too... In my case, the length of the vectors in each event is different.

I tried doing the following:

customstruct = Dict("VecVecInt" => Vector{Vector{Int32}})

const VecVecInt = customstruct
function interped_data(rawdata, rawoffsets, ::Type{Vector{Vector{Int32}}}, ::Type{Offsetjagg})
    _size = 64 # needs to account for 32 bytes header
    dp = 0 # book keeping for copy_to!
    lr = length(rawoffsets)
    offset = Vector{Int32}(undef, lr)
    offset[1] = 0
    @views @inbounds for i in 1:lr-1
        start = rawoffsets[i]+10+1
        stop = rawoffsets[i+1]
        l = stop-start+1
        if l > 0
            unsafe_copyto!(rawdata, dp+1, rawdata, start, l)
            dp += l
            offset[i+1] = offset[i] + l
        else
            offset[i+1] = offset[i]
        end
    end
    resize!(rawdata, dp)
    real_data = interped_data(rawdata, offset, VecVecInt, Nojagg)
    offset .รท= _size
    offset .+= 1
    VectorOfVectors(real_data, offset)
end

The error I get when running:

data, offsets = UnROOT.array(f, "Tree/Event/PMTBinnedWaveforms", raw=true), where PMTBinnedWaveforms is the std::vector<std::vector<int>> I am trying to read.

is

MethodError: no method matching ROOTFile(::String, ::Dict{String, DataType})

Closest candidates are:
  ROOTFile(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any)
   @ UnROOT ~/.julia/packages/UnROOT/mBdWz/src/root.jl:13
  ROOTFile(::Function, ::Any...; pv...)
   @ UnROOT ~/.julia/packages/UnROOT/mBdWz/src/root.jl:25
  ROOTFile(::String, ::Int32, ::Union{UnROOT.FileHeader32, UnROOT.FileHeader64}, ::Union{UnROOT.HTTPStream, UnROOT.MmapStream, UnROOT.XRDStream}, ::Union{UnROOT.TKey32, UnROOT.TKey64}, ::UnROOT.Streamers, ::UnROOT.ROOTDirectory, ::Dict{String, Type})
   @ UnROOT ~/.julia/packages/UnROOT/mBdWz/src/root.jl:13
  ...

Stacktrace:
 [1] top-level scope
   @ In[6]:2

I am really stuck. Could someone help?

soudk avatar May 25 '23 19:05 soudk

Ahm, do you have an example file? That should work "out-of-the-box" ๐Ÿ™ˆ

tamasgal avatar May 25 '23 19:05 tamasgal

Sure, actually here is one: https://drive.google.com/drive/folders/1qLURkYheLkdwoEj_tyGLG7JsV6wShSGt?usp=sharing

I'm trying to read PMTBinnedWaveforms and PMTWaveforms under ODTree.

soudk avatar May 25 '23 19:05 soudk

julia> ROOTFile("/tmp/VetoPMTAnalysis_000.root")["ODTree"]
ODTree (TTree)
โ””โ”€ "ODEvent"


julia> ROOTFile("/tmp/VetoPMTAnalysis_000.root")["ODTree"]["ODEvent"]
ODEvent
โ”œโ”€ TObject
โ”‚  โ”œโ”€ fUniqueID
โ”‚  โ””โ”€ fBits
โ”œโ”€ eventNumber
โ”œโ”€ muImpactParameter
โ”œโ”€ LXeImpactParameter
โ”œโ”€ muTrackLength
โ”œโ”€ muEnergy
โ”œโ”€ totalHits
โ”œโ”€ totalHitsPreQE
โ”œโ”€ initCherenkovOP
โ”œโ”€ PMTIDVec
โ”œโ”€ PMTWaveforms
โ”œโ”€ PMTBinnedWaveforms
โ””โ”€ PMTTriggerVec

so your TTree contains custom struct, in this case it's tricky

Moelf avatar May 25 '23 19:05 Moelf

It's reading

  fClassName: String "ODPMTDS"
  fParentName: String "ODPMTDS"

and you can check the streamer for that class with UnROOT.streamerfor(f, "ODPMTDS") (see below the output).

The problem is that the branch splitting is limited in your case (default is 99, which means that you basically have a ROOT branch with a corresponding path for each field), so that you need a parser which is able to parse the whole class instance. This means that you are not able to read e.g. only a single field PMTBinnedWaveforms of the ODPMTDS, you need to deserialise everything. ๐Ÿ˜ž

julia> UnROOT.streamerfor(f, "ODPMTDS")
UnROOT.StreamerInfo(UnROOT.TStreamerInfo{UnROOT.TObjArray}("ODPMTDS", "", 0x14fb5c22, 1, UnROOT.TObjArray("", 0, Any[UnROOT.TStreamerBase
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "TObject"
 fTitle: String "Basic ROOT object"
 fType: Int32 66
 fSize: Int32 0
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, -1877229523, 0, 0, 0]
 fTypeName: String "BASE"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
 fBaseVersion: Int32 1
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "eventNumber"
 fTitle: String ""
 fType: Int32 3
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "int"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "muImpactParameter"
 fTitle: String ""
 fType: Int32 5
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "float"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "LXeImpactParameter"
 fTitle: String ""
 fType: Int32 5
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "float"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "muTrackLength"
 fTitle: String ""
 fType: Int32 5
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "float"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "muEnergy"
 fTitle: String ""
 fType: Int32 5
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "float"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "totalHits"
 fTitle: String ""
 fType: Int32 3
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "int"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "totalHitsPreQE"
 fTitle: String ""
 fType: Int32 3
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "int"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "initCherenkovOP"
 fTitle: String ""
 fType: Int32 3
 fSize: Int64 4
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "int"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
, UnROOT.TStreamerSTL
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "PMTIDVec"
 fTitle: String ""
 fType: Int32 500
 fSize: Int32 24
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "vector<int>"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
 fSTLtype: Int32 1
 fCtype: Int32 3
, UnROOT.TStreamerSTL
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "PMTWaveforms"
 fTitle: String "All hits on PMTs"
 fType: Int32 500
 fSize: Int32 24
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "vector<vector<float> >"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
 fSTLtype: Int32 1
 fCtype: Int32 61
, UnROOT.TStreamerSTL
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "PMTBinnedWaveforms"
 fTitle: String ""
 fType: Int32 500
 fSize: Int32 24
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "vector<vector<int> >"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
 fSTLtype: Int32 1
 fCtype: Int32 61
, UnROOT.TStreamerSTL
 version: UInt16 0x0004
 fOffset: Int64 0
 fName: String "PMTTriggerVec"
 fTitle: String ""
 fType: Int32 500
 fSize: Int32 24
 fArrayLength: Int32 0
 fArrayDim: Int32 0
 fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
 fTypeName: String "vector<vector<int> >"
 fXmin: Float64 0.0
 fXmax: Float64 0.0
 fFactor: Float64 0.0
 fSTLtype: Int32 1
 fCtype: Int32 61
])), Set(Any["TObject"]))

tamasgal avatar May 25 '23 19:05 tamasgal

Python uproot can parse it

In [16]: up
Out[16]: <module 'uproot' from '/home/akako/.conda/envs/hep/lib/python3.11/site-packages/uproot/__init__.py'>

In [17]: r = up.open("/tmp/VetoPMTAnalysis_000.root")["ODTree"].arrays()

In [18]: r.PMTBinnedWaveforms[0]
Out[18]: <Array [[0, 0, 0, 0, 0, 0, ..., 0, 0, 0, 0, 0], ...] type='472 * var * int32'>

but I don't think we can do much here at the moment, parsing arbitrary C++ class without maximal splitting is too hard for now.


if you convert the TTree to RNTuple, we should be able to read that easily

Moelf avatar May 25 '23 19:05 Moelf

Automatic parsing of custom stuff is definitely on the big todo list, but I am totally overloaded ๐Ÿ˜ž still hoping that a few more contributors jump in soon ๐Ÿ™‚

tamasgal avatar May 25 '23 19:05 tamasgal

Yeah, I was using Python UpROOT before but stumbled on, and really like, UnROOT hence the potential swap over.

Thanks for the help! I'll try converting to an RNTuple and see, I don't really need the other TTree right now anyway.

soudk avatar May 25 '23 20:05 soudk

Or set the branch splitting to 99 ;)

tamasgal avatar May 25 '23 20:05 tamasgal