[runtime_cxxmodules] Enable on AArch64
It was disabled in commit a67863d33a ("Disable modules on aarch64 due to ODR violation") in 2019. I cannot reproduce these problems on lxplus-arm, so try to turn it back on.
Test Results
18 files 18 suites 4d 2h 21m 3s ⏱️ 2 662 tests 2 662 ✅ 0 💤 0 ❌ 46 198 runs 46 198 ✅ 0 💤 0 ❌
Results for commit b0121698.
:recycle: This comment has been updated with latest results.
I propose to merge it once we have the ARM nodes online to be able to test immediately, would this be ok?
Yes, I'm waiting for the AArch64 node in our CI so we can test there, and then (after) I'd still like to ask CMS to run their tests.
@aandvalenzuela @smuzaffar if you have some cycles, can you test this change with CMSSW on AArch64? This should align the configurations with x86_64 to also enable runtime_cxxmodules by default
https://github.com/cms-sw/root/pull/212 is running cmssw aarch64 tests
Hi, most of cmssw tests passed but for few relvals we get runtime errors like [a]
[a] https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7b6638/42123/runTheMatrix-results/140.063_RunZeroBias2022D/step3_RunZeroBias2022D.log
cling JIT session error: In graph cling-module-926-jitted-objectbuffer, section .text._ZNK4reco10HitPattern23numberOfLostTrackerHitsENS0_11HitCategoryE: relocation target "_ZN4reco10HitPattern16missingHitFilterEt" at address 0x4000968500f0 is out of range of Page21 fixup at 0x4001a7270114 (_ZNK4reco10HitPattern23numberOfLostTrackerHitsENS0_11HitCategoryE, 0x4001a727010c + 0x8)
----- Begin Fatal Exception 11-Oct-2024 15:08:51 CEST-----------------------
An exception of category 'FatalRootError' occurred while
[0] Processing Event run: 357735 lumi: 53 event: 87840020 stream: 0
[1] Running path 'dqmoffline_1_step'
[2] Prefetching for module NanoAODDQM/'nanoDQM'
[3] Prefetching for module SimplePATTauFlatTableProducer/'boostedTauTable'
[4] Prefetching for module PATObjectCrossLinker/'linkedObjects'
[5] Prefetching for module PATMuonRefSelector/'finalMuons'
[6] Prefetching for module PATMuonUserDataEmbedder/'slimmedMuonsWithUserData'
[7] Calling method for module EvaluateMuonMVAID/'muonMVAID'
Additional Info:
[a] Fatal Root Error: @SUB=TClingCallFunc::make_wrapper
Failed to compile
==== SOURCE BEGIN ====
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wformat-security"
__attribute__((used)) __attribute__((annotate("__cling__ptrcheck(off)")))
extern "C" void __cf_365(void* obj, int nargs, void** args, void* ret)
{
if (ret) {
new (ret) (double) (((const reco::TrackBase*)obj)->validFraction());
return;
}
else {
(void)(((const reco::TrackBase*)obj)->validFraction());
return;
}
}
#pragma clang diagnostic pop
==== SOURCE END ====
----- End Fatal Exception -------------------------------------------------
Another exception was caught while trying to clean up files after the primary fatal exception.
Hi, most of cmssw tests passed but for few relvals we get runtime errors [...]
Thanks for testing! This needs debugging (likely after CHEP)...
Revisiting this PR before the end of the year: Thanks to the clear error message and some guess work, I managed to reproduce the issue in a standalone ROOT session:
root [0] struct A { static void f() {} void take_f(void (*fp)()) { fp(); } void pass_f() { take_f(f); } void call_f() { f(); } };
root [1] A::f()
root [2] #include <sys/mman.h>
root [3] for (int i = 0; i < 1024 * 1024; i++) { mmap(nullptr, 8192, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); }
root [4] A a;
root [5] a.call_f()
root [6] a.pass_f()
cling JIT session error: In graph cling-module-16-jitted-objectbuffer, section .text._ZN11__cling_N501A6pass_fEv: relocation target "_ZN11__cling_N501A1fEv" at address 0xffffa18a4034 is out of range of Page21 fixup at 0xfffd88b60044 (_ZN11__cling_N501A6pass_fEv, 0xfffd88b6003c + 0x8)
Currently still investigating why this happens with runtime_cxxmodules but not without...
Alright, the issue with the reproducer in https://github.com/root-project/root/pull/16401#issuecomment-2532084281 is understood and fixed. Let's hope that this is also fixes the CMS relvals - @aandvalenzuela @smuzaffar could you maybe run the tests again? Thanks in advance for all your help!
Alright, the issue with the reproducer in #16401 (comment) is understood and fixed. Let's hope that this is also fixes the CMS relvals - @aandvalenzuela @smuzaffar could you maybe run the tests again? Thanks in advance for all your help!
cmssw tests started via https://github.com/cms-sw/root/pull/215
cmssw tests for aarch64 look good
cmssw tests for aarch64 look good
Fantastic, thank you! Unfortunately our macOS nodes are not happy, so I'll need to push a slightly fixed version. I don't think we strictly need to test with full CMSSW once more...
@vgvassilev FYI you already approved this three months ago, and now that the issue in CMS is fixed I plan to land this soon
Go ahead.