[df] [for discussion] JIT graph creation functions only once
This PR
This is some code, that could also be improved further, to reduce the per-sample JIT cost in RDataFrame, and potentially a way to avoid the "controlled memory leaks" #15520 (though this code currently does not yet address them in full).
I don't propose to merge this code as is, but I'm opening the PR as a reference for future discussion
Before this PR
Currently JITed nodes of the computation graph, RDF JITs the function code just once, but JITs code lines to create the computational graph once per graph, with code like
ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(
R_rdf::func45,
new const char*[3]{"Jet_isSelected", "Jet_area", "Jet_pt"}, 3,
"SelJet_area",
reinterpret_cast<ROOT::Detail::RDF::RLoopManager*>(0x562b52fe7cb0),
reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(0x562b5530d120),
reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x562b552e1900),
reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x562b552e0d20));
This is done by methods like BookDefineJit that craft the code as a string and pass it to RLoopManager::toJitExec. Calls are accumulated, and executed in one go by RLoopManager::Jit()
With this PR
Methods like BookDefineJit now call a new function RLoopManager::RegisterJitHelperCall and the whole operation is refactored in three steps
- a definition via JIT of a registrator function, done once per function, kind of action (e.g. Define) and column name
namespace R_rdf {
void jitNodeRegistrator_70(
ROOT::Detail::RDF::RLoopManager *lm,
std::shared_ptr<ROOT::Detail::RDF::RNodeBase> *prevNodeOnHeap,
ROOT::Internal::RDF::RColumnRegister* colRegister,
const std::vector<std::string> & colNames,
void *wkJittedDefine, void *) {
ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(R_rdf::func45,
colNames,
"SelJet_area",
lm, reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(wkJittedDefine),
colRegister,
prevNodeOnHeap);
}
}
- lookup of the function address in memory, via calls to
gInterpreter - from C++ call the registrator function once per computation graph
Similarly to the current setup, the declarations from point (1) are accumulated in a single string and passed to the interpreter in one go by RLoopManager::Jit(), and the address lookup (2) is also done inside RLoopManager::Jit()
The deferred function calls (3) are done both in RLoopManager::Jit() and also in RLoopManager::Run in order to also allow doing them in multiple threads when RunGraphs is used (RunGraphs calls Jit only on one of the loop managers, and then calls Run on all multithreaded)
What next?
- I tested the code on several RDF analyses, but it could be tested more extensively, and also not just on x86_64.
- The column name could also be gathered in the registration, to reduce the number of registrator functions to JIT
- The objects created on the heap by
BookDefineJitand deleted in the JITed calls likeJitDefineHelpercould be instead owned by the deferred function call object via RAII - Code could be cleaned further, e.g. the registration key could become an integer and the map a concurrent container to avoid the read lock, a c++ std::variant could also be used instead of a
void *for the weak pointer, the extra column names for variations could be in a smart pointer, ...
Thank you @gpetruc for kickstarting this absolutely non-trivial change to a core part of RDataFrame! It is definitely worth discussing, I will take a look at this.
Test Results
22 files 22 suites 3d 16h 32m 42s ⏱️ 3 785 tests 3 785 ✅ 0 💤 0 ❌ 81 249 runs 81 249 ✅ 0 💤 0 ❌
Results for commit 3c3b1ee6.
:recycle: This comment has been updated with latest results.
I've taken the liberty of rebasing this PR on master. Locally, I have tried to exploit these changes as a baseline to remove the controlled leaks of https://github.com/root-project/root/issues/15520 , but I've hit a blocker because of CallBuildAction where not only the jitted action pointer, the column register pointer and the previous node pointer are deleted, but also the pointer to the HelperArgType instance which does not fit in the workflow of this PR (it would require some sort of template DeferredJitCall or another way to make it agnostic. Before proceeding, I'd like to see what's missing for this PR to be merged, then possibly tackle the leaks in another PR
@gpetruc I have squashed your proposed changes in one commit, the second commit introduces improvements that also lead to fixing #15520. Our CI is running well with the changes, let me know if you have time to check this PR again with your analysis, I'll be running some tests with valgrind locally as well.
Hi @vepadulano,
Thanks for reviving and updating this PR, I've run a quick check with one of my analysis benchmarks and it looks ok.
I'm failing to do more tests because for reasons I don't yet understand the el9 docker containers I'm building with this root branch or the master one fail to access eos ( Plugin version SecClnt v5.8.0 is incompatible with seckrb5 v5.9.1 (must be <= 5.8.x) in sec.protocol libXrdSeckrb5-5.so )
Okay, after fixing my docker builds now I managed to run also more tests with this PR, and confirm that results look ok.