Dan Riley

Results 76 comments of Dan Riley

> Everything you said about DXR makes sense. Does all this also apply to LXR? Does LXR have the same issues and is LXR also an unsupported dead project? In...

There's an additional issue that large VLAs that cause problems with multi-threaded processes may not show up in single-threaded PR tests. I haven't checked if we impose the same stack...

For emphasis: external termination request means it timed out, likely due to a deadlock. The timeout signal will be delivered to thread 1, but to diagnose the deadlock we need...

Seen again in [WF25234.911 step3](https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/slc7_amd64_gcc11/CMSSW_13_3_X_2023-09-13-2300/pyRelValMatrixLogs/run/25234.911_TTbar_14TeV+2026D99_DD4hep/step3_TTbar_14TeV+2026D99_DD4hep.log) on slc7_amd64_gcc11 CMSSW_13_3_X_2023-09-13-2300. Apparently very rare, but this at least lets us eliminate some possible thread interactions. ``` Thread 5 (Thread 0x1478cb3ff700 (LWP 3500077) "cmsRun"):...

New stack trace, somewhat different from the old ones. WF 25234.911, el9_amd64_gcc12, CMSSW_14_0_X_2023-11-19-2300 https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/el9_amd64_gcc12/CMSSW_14_0_X_2023-11-19-2300/pyRelValMatrixLogs/run/25234.911_TTbar_14TeV+2026D99_DD4hep/step3_TTbar_14TeV+2026D99_DD4hep.log ``` Thread 5 (Thread 0x154f9f3ff640 (LWP 3576526) "cmsRun"): ##3 0x0000154fe93daaa0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_X_2023-11-19-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so #4...

> Now also a floating point exception in CMSSW_14_1_NONLTO_X_2024-03-24-0000 (although I'm puzzled what enabled those) we have seen external packages messing with the FPE state, that's why we added it...

Still not fixed. I can't reopen #33452 (which was unexpectedly automatically close), so I'll reopen this one...

This is after #37840 merged, so it looks like yet another attempt to fix this has failed...

Possibly related to #43723? The fix for that issue in #43852 was just merged, so will be interesting to see if it improves the reproducibility.