root icon indicating copy to clipboard operation
root copied to clipboard

[df] [for discussion] JIT graph creation functions only once

Open gpetruc opened this issue 1 year ago • 6 comments

This PR

This is some code, that could also be improved further, to reduce the per-sample JIT cost in RDataFrame, and potentially a way to avoid the "controlled memory leaks" #15520 (though this code currently does not yet address them in full).

I don't propose to merge this code as is, but I'm opening the PR as a reference for future discussion

Before this PR

Currently JITed nodes of the computation graph, RDF JITs the function code just once, but JITs code lines to create the computational graph once per graph, with code like

ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(
        R_rdf::func45, 
        new const char*[3]{"Jet_isSelected", "Jet_area", "Jet_pt"}, 3, 
        "SelJet_area",
        reinterpret_cast<ROOT::Detail::RDF::RLoopManager*>(0x562b52fe7cb0),
        reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(0x562b5530d120),
        reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x562b552e1900),
        reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x562b552e0d20));

This is done by methods like BookDefineJit that craft the code as a string and pass it to RLoopManager::toJitExec. Calls are accumulated, and executed in one go by RLoopManager::Jit()

With this PR

Methods like BookDefineJit now call a new function RLoopManager::RegisterJitHelperCall and the whole operation is refactored in three steps

  1. a definition via JIT of a registrator function, done once per function, kind of action (e.g. Define) and column name
namespace R_rdf {
  void jitNodeRegistrator_70(
        ROOT::Detail::RDF::RLoopManager *lm, 
        std::shared_ptr<ROOT::Detail::RDF::RNodeBase> *prevNodeOnHeap,
        ROOT::Internal::RDF::RColumnRegister* colRegister, 
        const std::vector<std::string> & colNames, 
        void *wkJittedDefine, void *) {
                 ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(R_rdf::func45,
                 colNames, 
                 "SelJet_area", 
                 lm, reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(wkJittedDefine),
                 colRegister, 
                 prevNodeOnHeap);
        }
}
  1. lookup of the function address in memory, via calls to gInterpreter
  2. from C++ call the registrator function once per computation graph

Similarly to the current setup, the declarations from point (1) are accumulated in a single string and passed to the interpreter in one go by RLoopManager::Jit(), and the address lookup (2) is also done inside RLoopManager::Jit()

The deferred function calls (3) are done both in RLoopManager::Jit() and also in RLoopManager::Run in order to also allow doing them in multiple threads when RunGraphs is used (RunGraphs calls Jit only on one of the loop managers, and then calls Run on all multithreaded)

What next?

  • I tested the code on several RDF analyses, but it could be tested more extensively, and also not just on x86_64.
  • The column name could also be gathered in the registration, to reduce the number of registrator functions to JIT
  • The objects created on the heap by BookDefineJit and deleted in the JITed calls like JitDefineHelper could be instead owned by the deferred function call object via RAII
  • Code could be cleaned further, e.g. the registration key could become an integer and the map a concurrent container to avoid the read lock, a c++ std::variant could also be used instead of a void * for the weak pointer, the extra column names for variations could be in a smart pointer, ...

gpetruc avatar Dec 16 '24 15:12 gpetruc

Thank you @gpetruc for kickstarting this absolutely non-trivial change to a core part of RDataFrame! It is definitely worth discussing, I will take a look at this.

vepadulano avatar Dec 18 '24 09:12 vepadulano

Test Results

    22 files      22 suites   3d 16h 32m 42s ⏱️  3 785 tests  3 785 ✅ 0 💤 0 ❌ 81 249 runs  81 249 ✅ 0 💤 0 ❌

Results for commit 3c3b1ee6.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Dec 20 '24 03:12 github-actions[bot]

I've taken the liberty of rebasing this PR on master. Locally, I have tried to exploit these changes as a baseline to remove the controlled leaks of https://github.com/root-project/root/issues/15520 , but I've hit a blocker because of CallBuildAction where not only the jitted action pointer, the column register pointer and the previous node pointer are deleted, but also the pointer to the HelperArgType instance which does not fit in the workflow of this PR (it would require some sort of template DeferredJitCall or another way to make it agnostic. Before proceeding, I'd like to see what's missing for this PR to be merged, then possibly tackle the leaks in another PR

vepadulano avatar Nov 29 '25 17:11 vepadulano

@gpetruc I have squashed your proposed changes in one commit, the second commit introduces improvements that also lead to fixing #15520. Our CI is running well with the changes, let me know if you have time to check this PR again with your analysis, I'll be running some tests with valgrind locally as well.

vepadulano avatar Dec 04 '25 18:12 vepadulano

Hi @vepadulano,

Thanks for reviving and updating this PR, I've run a quick check with one of my analysis benchmarks and it looks ok.

I'm failing to do more tests because for reasons I don't yet understand the el9 docker containers I'm building with this root branch or the master one fail to access eos ( Plugin version SecClnt v5.8.0 is incompatible with seckrb5 v5.9.1 (must be <= 5.8.x) in sec.protocol libXrdSeckrb5-5.so )

gpetruc avatar Dec 08 '25 16:12 gpetruc

Okay, after fixing my docker builds now I managed to run also more tests with this PR, and confirm that results look ok.

gpetruc avatar Dec 10 '25 08:12 gpetruc