root icon indicating copy to clipboard operation
root copied to clipboard

Use data source label in distributed HeadNode factory

Open vepadulano opened this issue 1 year ago • 1 comments

Introduce a new method to get a label for the data source that the current RDataFrame is processing. There are three major types:

  • The dataframe will process a TTree dataset
  • The dataframe will process an empty dataset
  • The dataframe will process data from an RDataSource

The function returns a label with the suffix "DS" also for the first two cases, to be aligned as much as possible with the RDataSource infrastructure.

Make use of this function in distributed RDataFrame to create the headnode of the Python computation graph. This also avoids extra parsing in the factory function which includes opening the first input file once more to distinguish between TTree or RNTuple input (in case the first input argument is a string).

vepadulano avatar May 17 '24 10:05 vepadulano

Test Results

    14 files      14 suites   3d 9h 15m 23s :stopwatch:  2 696 tests  2 694 :white_check_mark: 0 :zzz: 2 :x: 35 497 runs  35 495 :white_check_mark: 0 :zzz: 2 :x:

For more details on these failures, see this check.

Results for commit 2e9f44cc.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar May 17 '24 13:05 github-actions[bot]