UnROOT.jl icon indicating copy to clipboard operation
UnROOT.jl copied to clipboard

[RNTuple] accessing nested structs is not lazy enough

Open Moelf opened this issue 2 years ago • 1 comments

Consider the following top-level field (column in the table analogy)

├─ Symbol("AntiKt4TruthDressedWZJetsAux:") ⇒ Struct
│                                            ├─ :m ⇒ Vector
│                                            │       ├─ :offset ⇒ Leaf{UnROOT.Index64}(col=23)
│                                            │       └─ :content ⇒ Leaf{Float32}(col=24)
│                                            ├─ :pt ⇒ Vector
│                                            │        ├─ :offset ⇒ Leaf{UnROOT.Index64}(col=17)
│                                            │        └─ :content ⇒ Leaf{Float32}(col=18)
│                                            ├─ :eta ⇒ Vector
│                                            │         ├─ :offset ⇒ Leaf{UnROOT.Index64}(col=19)
│                                            │         └─ :content ⇒ Leaf{Float32}(col=20)
│                                            ├─ :constituentWeights ⇒ Vector
│                                            │                        ├─ :offset ⇒ Leaf{UnROOT.Index64}(col=29)
│                                            │                        └─ :content ⇒ Vector
│                                            │                                      ├─ :offset ⇒ Leaf{UnROOT.Index64}(col=30)
│                                            │                                      └─ :content ⇒ Leaf{Float32}(col=31)

currently, when we loop over the events, the access is too "eager":

for evt in rntuple
    evt.var"AntiKt4TruthDressedWZJetsAux:".pt
end

In this case, we only want to access the storage related to the pTs (i.e. rntuple column 17 and 18), but in reality we're reading all the columns (17,18,19,20,23,24,29,30,31) as soon as we do evt.var"AntiKt4TruthDressedWZJetsAux:"

One possible way is to switch to AwkwardArray.jl by @jpivarski, and represent the whole rntuple as a big RecordArray and theoretically it will work for columnar access (i.e. rntuple.var"AntiKt4TruthDressedWZJetsAux:".pt), and it may not solve our event-iteration problem.

Another possible way is to use StructArrays.jl more smartly, @peremato did you run into anything like this in EDM4hep.jl? If so anything you found working?

Moelf avatar Mar 13 '24 14:03 Moelf

Another possible way is to use StructArrays.jl more smartly, @peremato did you run into anything like this in EDM4hep.jl? If so anything you found working?

With EDM4hep, I think I do to have this problem since the top level is Vector of POD structs instead of being a struct of vectors as is in this case. It is true that I read all the fields (I guess) because I really construct at the end a SaA of the container.

peremato avatar Mar 13 '24 15:03 peremato