UnROOT.jl icon indicating copy to clipboard operation
UnROOT.jl copied to clipboard

[Discussion] How to pre-compute branch buffer type

Open Moelf opened this issue 3 years ago • 1 comments

Right now the most complicated logic apart from streamer parsing is: https://github.com/tamasgal/UnROOT.jl/blob/36fa5eb0a1e19c58f1ade0e40d9b4f5e43a35244/src/iteration.jl#L109-L122

and it's auto_T_JaggT dependency. Every time we try to make some change, we need to guess what the resultant type is at runtime and pre-process it.

I think a better strategy is to actually run the runtime functions and fetch one event just to see the type, and complete the constructor accordingly. This involves type computing, so maybe https://github.com/vtjnash/ComputedFieldTypes.jl

can simplify our lives

Moelf avatar Feb 16 '22 14:02 Moelf

Yes, actually I am currently working on that part and I am still unsure how to solve it. If there are custom classes (and streamers) which are unknown, it errors, see: https://github.com/tamasgal/UnROOT.jl/blob/36fa5eb0a1e19c58f1ade0e40d9b4f5e43a35244/src/root.jl#L130-L134

What I am doing now is to add a logic which looks up the streamer for the custom class (via UnROOT.streamerfor(f, classname)) and then create the parser code and generate the structs dynamically, right after it tries (and fails) to lookup a custom interpretation: https://github.com/tamasgal/UnROOT.jl/blob/36fa5eb0a1e19c58f1ade0e40d9b4f5e43a35244/src/root.jl#L324-L343

Actually, the right place to do this stuff would be right at the place where we read and parse the streamer info so we have everything done right after opening the file (this partially answers your question, but see the last remark below).

This is the point where it gets really complicated. I can tell you already that for our internal ROOT formats, we have huge classes with nested streamers, so this will be fun as hell.

Below is the streamer info for the Evt class, which has tons of fields. Fun fact: I can read them all with UnROOT if I know the exact "path", so e.g. f[E/Evt] fails (with the error above) but f["E/Evt/mc_trks/mc_trks.pos.x"] works flawlessly:

julia> f["E/Evt/mc_trks/mc_trks.pos.x"]
Found streamer for Vec
30129-element LazyBranch{SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, UnROOT.Nooffsetjagg, ArraysOfArrays.VectorOfVectors{Float64, Vector{Float64}, Vector{Int32}, Vector{Tuple{}}}}:
[478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656, 478.92062395814656]
Full streamer info for the `Evt` class
julia> UnROOT.streamerfor(f, "Evt")
UnROOT.StreamerInfo(UnROOT.TStreamerInfo{UnROOT.TObjArray}("Evt", "", 0x4555d797, 14, UnROOT.TObjArray("", 0, Any[UnROOT.TStreamerBase
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "AAObject"
  fTitle: String ""
  fType: Int32 0
  fSize: Int32 0
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 796007337, 0, 0, 0]
  fTypeName: String "BASE"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fBaseVersion: Int32 6
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "id"
  fTitle: String "offline event identifier"
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "det_id"
  fTitle: String "detector identifier from DAQ"
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "mc_id"
  fTitle: String "identifier of the MC event (as found in ascii or antcc file)."
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "run_id"
  fTitle: String "DAQ run identifier"
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "mc_run_id"
  fTitle: String "MC  run identifier"
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "frame_index"
  fTitle: String "from the raw data"
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "trigger_mask"
  fTitle: String "trigger mask from raw data (i.e. the trigger bits)"
  fType: Int32 17
  fSize: Int64 8
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "ULong64_t"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "trigger_counter"
  fTitle: String "trigger counter"
  fType: Int32 17
  fSize: Int64 8
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "ULong64_t"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "overlays"
  fTitle: String "number of overlaying triggered events"
  fType: Int32 13
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "unsigned int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  …  UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "w"
  fTitle: String "MC: Weights w[0]=w1, w[1]=w2, w[2]]=w3 (see e.g. Tag list)"
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 8
, UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "w2list"
  fTitle: String "MC: factors that make up w[1]=w2       (see e.g. Tag list)"
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 8
, UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "w3list"
  fTitle: String "MC: atmospheric flux information"
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 8
, UnROOT.TStreamerObjectAny
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "mc_event_time"
  fTitle: String "MC: true generation time (UTC) of the event, (default: 01 Jan 1970 00:00:00)"
  fType: Int32 62
  fSize: Int32 16
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "TTimeStamp"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "mc_t"
  fTitle: String "MC: time where the mc-event was put in the timeslice, since start of run (offset+frameidx*timeslice_duration)"
  fType: Int32 8
  fSize: Int64 8
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "double"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "mc_hits"
  fTitle: String "MC: list of MC truth hits (Hit)"
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 61
, UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "mc_trks"
  fTitle: String "MC: list of MC truth tracks (Trk)"
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 61
, UnROOT.TStreamerString
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "comment"
  fTitle: String "user can use this as he/she likes"
  fType: Int32 65
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "TString"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "index"
  fTitle: String "user can use this as he/she likes"
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, UnROOT.TStreamerBasicType
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "flags"
  fTitle: String "user can use this as he/she likes"
  fType: Int32 3
  fSize: Int64 4
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "int"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
])), Set(Any["AAObject"]))

Since implementing the whole parsing of such a multiply nested structure is a lot of work, I was also considering to just return the subbranches, so that the user at least can navigate and access the data in a split-branch manner, but of course it would also be quite cool to see f["E/Evt"] returning a completely materialised struct with fields as nested/jagged LazyArrays ;)

Anyways, coming back to "fetching one event" to see the type: it is not necessary, it's in the streamer info. So that can be parsed beforehand.

tamasgal avatar Feb 17 '22 09:02 tamasgal