Dan Riley

Results 76 comments of Dan Riley

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc11/CMSSW_12_6_UBSAN_X_2022-10-05-1100/pyRelValMatrixLogs/run/140.0_HydjetQ_B12_5020GeV_2011+HydjetQ_B12_5020GeV_2011+DIGIHI2011+RECOHI2011+HARVESTHI2011/step1_HydjetQ_B12_5020GeV_2011+HydjetQ_B12_5020GeV_2011+DIGIHI2011+RECOHI2011+HARVESTHI2011.log#/497-497 Possibly related, from the UBSAN IBs, unexpected(?) -1 seems like a problem: ``` /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/DataFormats/HcalDetId/src/HcalCastorDetId.cc:11:49: runtime error: left shift of negative value -1 #0 0x2ad3f65bf795 (/cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/lib/el8_amd64_gcc11/libDataFormatsHcalDetId.so+0x3d795) ``` followed by ```...

We still see this exception intermittently, the latest I could find is last week: https://cmssdt.cern.ch/SDT/cgi-bin/logreader/cc8_aarch64_gcc9/CMSSW_12_1_X_2021-10-22-1100/pyRelValMatrixLogs/run/10801.0_SingleElectronPt10+2018+SingleElectronPt10_pythia8_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step6_SingleElectronPt10+2018+SingleElectronPt10_pythia8_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log#/159-159

The IBs run with 4 threads, and what we see is all 4 threads failing on the first event for that thread. With all 4 threads failing, it's likely some...

More likely timing dependent. It's an all or none failure--either all the streams fail or none do, that's not consistent with an event dependent failure. Thread races can be very...

Probably related, in https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/slc7_amd64_gcc900/CMSSW_12_2_DEVEL_X_2021-11-04-2300/pyRelValMatrixLogs/run/10004.0_SingleGammaPt10+2017+SingleGammaPt10_pythia8_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step6_SingleGammaPt10+2017+SingleGammaPt10_pythia8_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log ``` ----- Begin Fatal Exception 05-Nov-2021 07:23:16 CET----------------------- An exception of category 'StdException' occurred while [0] Processing Event run: 1 lumi: 1 event: 6 stream: 2...

I got 140.0009 to crash in gdb with line numbers. The result is consistent, but still not terribly enlightening. Here are the only active threads, 11 is the one that...

the really obvious question, which I decided not to ask earlier: why are they coding their own spinlock?

> This one looks correct. > > https://cmssdt.cern.ch/lxr/source/FWCore/Services/plugins/ConcurrentModuleTimer.cc#0206 Still wrong, atomic is not guaranteed to be lock-free. Better is to use std::atomic_flag, but even then, best is to just not...

#44447 does not fix the crash. Previously, it looks like the compiler was actually optimizing out the guard as undefined. With #44447 statsGuard_ now has a value--so the PR did...

> > pow(x,2) : no the compiler will not substitute with x*x: we need to introduce our own inline function "square" > > depends on the optimization flags according to...