root icon indicating copy to clipboard operation
root copied to clipboard

[ntuple] Enable subfield access in `REntry`

Open enirolf opened this issue 1 year ago • 2 comments

This PR adds the possibility to add subfields (with a provided parent) to the RNTupleModel, and subsequently access their values in the REntry using their fully qualified names (parent.subfield). Checks are included to ensure that the requested subfield is an actual valid, existing subfield of the provided parent.

This functionality will be used in the RNTupleProcessor, which provides an iterator over an REntry.

enirolf avatar Oct 15 '24 15:10 enirolf

Test Results

    18 files      18 suites   4d 2h 30m 48s ⏱️  2 699 tests  2 697 ✅ 0 💤 2 ❌ 46 128 runs  46 126 ✅ 0 💤 2 ❌

For more details on these failures, see this check.

Results for commit 4d3849bc.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Oct 16 '24 00:10 github-actions[bot]

As far as I understand the desired effect, would it be sufficient to have RNTupleModel::RegisterSubField(std::string_view qualifiedFieldName) which searches the added fields and places the subfield in the default entry (and remembers it for future entries)?

Thanks for this suggestion, that's a way better approach indeed!

I have changed the PR into draft mode because I realize there are a few more things to address, most notably how to deal with collection subfields. If I have a field structVec of type std::vector<CustomStruct> and I want to register structVec._0.s (where s is a std::string data member), this currently runs without issue, but the strings are added as scalar values to the entry. Some extra machinery is required to register this subfield in the entry as a std::vector<std::string>, which would (I believe) be the desired outcome. The RNTuple datasource takes the same approach, so inspration can be drawn from this implementation.

enirolf avatar Oct 17 '24 09:10 enirolf

Looks good from my point of view, I'll let Jakob have the final say. One point we should clarify at some point (but doesn't need to block this PR) is how subfield access plays with writing: It intuitively makes sense for reading as you can just have the same value multiple times in memory. But I guess we only want to take the top-level fields from the entry for writing? Should we forbid creating a (parallel) writer passing a model with registered subfields?

hahnjo avatar Oct 29 '24 08:10 hahnjo

Looks good from my point of view, I'll let Jakob have the final say. One point we should clarify at some point (but doesn't need to block this PR) is how subfield access plays with writing: It intuitively makes sense for reading as you can just have the same value multiple times in memory. But I guess we only want to take the top-level fields from the entry for writing? Should we forbid creating a (parallel) writer passing a model with registered subfields?

This is a very good point, I've only considered the reading side of things. I would be in favor of the last possibility and just completely disallow creating writers from models with registered subfields.

enirolf avatar Oct 29 '24 09:10 enirolf