root
root copied to clipboard
[DF] Add initial implementation for snapshotting to RNTuple
This PR adds a first iteration of snapshotting to RNTuple from an RDataFrame. It uses the existing Snapshot
interface, with an addition to RSnapshotOptions
, kOutputFormat
. This option can be set to write to either TTree, RNTuple, or take the default choice. The table below describes how Snapshot
behaves accoring to the output format option:
From TTree | From RNTuple | From other DS | |
---|---|---|---|
To TTree | ESnapshotOutputFormat::kDefault |
ESnapshotOutputFormat::kTTree |
ESnapshotOutputFormat::kDefault |
To RNTuple | Not yet possible, will be added in a follow-up, using functionality from RNTupleImporter |
ESnapshotOutputFormat::kDefault |
ESnapshotOutputFormat::kRNTuple |
Implementation
As mentioned, the existing Snapshot
interface is used. A new SnapshotRNTupleHelper
has been created to handle the creation and writing of the RNTuple, akin to the existing SnapshotHelper
(which has been renamed to SnapshotTTreeHelper
for consistency).
RLoopManager data source initialization (rev bbf221f)
The snapshot action creates a new loop manager which manages the snapshotted data set. The loop manager gets initialized before the actual snapshotting takes place. Originally, the pointer to the data source owned by the loop manager was marked as const
. Because the RNTuple's data source has to be created after the loop manager, for this PR the const
qualifier has been dropped.
Move ROOT::RDF::Experimental::FromRNTuple
(rev 0a29b02)
For snapshotting RNTuples, we need to include the header file for RNTupleDS in ActionHelpers.hxx
. To avoid dependency conflicts related to including ROOT/RDataFrame.hxx
, the free FromRNTuple
functions have been moved to a separate header.
Current limitations and follow-ups
This PR adds the minimal functionality for (single-threaded) snapshotting to RNTuple. A number of follow-ups are foreseen:
RNTuple write options
Currently no RNTuple-specific write options have been added to RSnapshotOptions
yet, except for compression settings which were already present as an option. Adding (a subset) of the other RNTupleWriteOptions
is trivial.
Default compression settings
RSnapshotOptions
' default compression setting is 101 (Zlib). However, RNTuple's default compression setting is 505 (zstd). We could change the default compression setting to kInherit
and decide which settings to use according to the target data format (unless explicitly set by the user, of course).
Multithreaded snapshotting
This PR only adds single-threaded RNTuple snapshotting. Multithreaded (and parallel) snapshotting will be addressed in a follow-up PR.
Tests
Corresponding roottest
PR: https://github.com/root-project/roottest/pull/1178
Tests for Windows have been disabled, due to permission denied-errors related to trying to recreate currently open TFiles. The regular snapshot tests have also been disabled for Windows, presumably for the same reason.