root [df] Improve user interface in DistrRDF

Add a number of functions that could replace the way user interacts with the C++ elements of the distributed, fully pythonic RDF analysis, so that this is more natural and clear as well as less error prone compared to the previous method. The new functions are:

DeclareHeaders
DeclareSharedLibs
DeclareFiles
DeclareCppCode

The functions work with both Spark and Dask backends.

The tests are introduced in roottest PR: https://github.com/root-project/roottest/pull/1177

Aug 26 '24 08:08 martamaja10

Test Results

18 files 18 suites 4d 7h 34m 25s ⏱️ 2 678 tests 2 678 ✅ 0 💤 0 ❌ 46 484 runs 46 484 ✅ 0 💤 0 ❌

Results for commit a0112de3.

:recycle: This comment has been updated with latest results.

Aug 26 '24 10:08 github-actions[bot]

An important part of this PR is deciding the naming scheme for the new user-facing API. I see two main ways: being pedantic or being coherent. Pedantic would mean that the function for headers should be IncludeHeaders, the one for shared libraries probably should be LoadSharedLibs etc. Instead if we want to be coherent and also hint at the fact that this is a tool for distributed execution, we could decide to name everything Distribute* so that it is already clear in the name of the function that the code/header/file will be uploaded to the workers somehow.

Personally, I would go for the Distribute* approach but we can discuss this next week in person.

I believe we need a section in the docs in RDataFrame.cxx describing these functions, with examples of usage. Ideally what I would like to see is something like a transition help guide, just to give an example
# If you do this in your local RDataFrame script
ROOT.gInterpreter.Declare("my_code")
df.Define(...)

# Do this in distributed RDF
ROOT.RDF.Distributed.DistributeCode("my_code")
df.Define(...)

sure, I will do that next.

I implemented all your comments (but the leftover debug statements - on purpose, as there are a few things we still need to debug - see my comments in the roottest PR - especially multiple declarations).

Aug 29 '24 14:08 martamaja10