root icon indicating copy to clipboard operation
root copied to clipboard

Use data source label in distributed HeadNode factory

Open vepadulano opened this issue 9 months ago • 1 comments

Introduce a new method to get a label for the data source that the current RDataFrame is processing. There are three major types:

  • The dataframe will process a TTree dataset
  • The dataframe will process an empty dataset
  • The dataframe will process data from an RDataSource

The function returns a label with the suffix "DS" also for the first two cases, to be aligned as much as possible with the RDataSource infrastructure.

Make use of this function in distributed RDataFrame to create the headnode of the Python computation graph. This also avoids extra parsing in the factory function which includes opening the first input file once more to distinguish between TTree or RNTuple input (in case the first input argument is a string).

vepadulano avatar May 17 '24 10:05 vepadulano

Test Results

    12 files      12 suites   2d 21h 39m 15s :stopwatch:  2 637 tests  2 636 :white_check_mark: 0 :zzz: 1 :x: 29 949 runs  29 948 :white_check_mark: 0 :zzz: 1 :x:

For more details on these failures, see this check.

Results for commit 5717ec1f.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar May 17 '24 13:05 github-actions[bot]