cmssw
cmssw copied to clipboard
Cut parser error in ROOT master IB
Workflow 1325.61 step 2 fails in CMSSW_11_3_ROOT6_X_2021-03-04-2300 with
----- Begin Fatal Exception 05-Mar-2021 08:39:28 CET-----------------------
An exception of category 'Configuration' occurred while
[0] Processing Event run: 1 lumi: 1 event: 6 stream: 3
[1] Running path 'dqmoffline_step'
[2] Calling method for module NanoAODDQM/'nanoDQMMC'
Exception Message:
Cut parser error:no method or data member named "getAnyValue" found for type "nanoaod::FlatTable::RowView" (char 0)
----- End Fatal Exception -------------------------------------------------
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc900/CMSSW_11_3_ROOT6_X_2021-03-04-2300/pyRelValMatrixLogs/run/1325.61_TTbar_13_106Xv1NanoAODINPUT+TTbar_13_106Xv1NanoAODINPUT+NANOAODMC2017_106XMiniAODv1/step2_TTbar_13_106Xv1NanoAODINPUT+TTbar_13_106Xv1NanoAODINPUT+NANOAODMC2017_106XMiniAODv1.log#/
A new Issue was created by @makortel Matti Kortelainen.
@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign core, xpog
FYI @pcanal
New categories assigned: core,xpog
@Dr15Jones,@smuzaffar,@fgolf,@mariadalfonso,@makortel,@gouskos you have been requested to review this Pull request/Issue and eventually sign? Thanks
assign dqm
New categories assigned: dqm
@jfernan2,@andrius-k,@ahmad3213,@kmaeshima,@rvenditti,@ErnestaP you have been requested to review this Pull request/Issue and eventually sign? Thanks
FYI @peruzzim
Another occurrance in CMSSW_11_3_DEVEL_X_2021-03-17-2300 1325.6 step 2
----- Begin Fatal Exception 18-Mar-2021 04:00:15 CET-----------------------
An exception of category 'Configuration' occurred while
[0] Processing Event run: 1 lumi: 101 event: 5006 stream: 1
[1] Running path 'dqmoffline_step'
[2] Calling method for module NanoAODDQM/'nanoDQMMC'
Exception Message:
Cut parser error:no method or data member named "getAnyValue" found for type "nanoaod::FlatTable::RowView" (char 0)
----- End Fatal Exception -------------------------------------------------
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc900/CMSSW_11_3_DEVEL_X_2021-03-17-2300/pyRelValMatrixLogs/run/1325.6_TTbar_13_94Xv1NanoAODINPUT+TTbar_13_94Xv1NanoAODINPUT+NANOAODMC2017_94XMiniAODv1/step2_TTbar_13_94Xv1NanoAODINPUT+TTbar_13_94Xv1NanoAODINPUT+NANOAODMC2017_94XMiniAODv1.log#/
@gpetruc according to github history, you were the one who introdudec nanoDQMC in the code, could you please have a look or point us for a responsible? Thanks
Is this issue still valid? Thanks
We still see this exception intermittently, the latest I could find is last week:
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/cc8_aarch64_gcc9/CMSSW_12_1_X_2021-10-22-1100/pyRelValMatrixLogs/run/10801.0_SingleElectronPt10+2018+SingleElectronPt10_pythia8_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step6_SingleElectronPt10+2018+SingleElectronPt10_pythia8_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log#/159-159
Thanks @dan131riley However I don't reproduce the error with that same RelVal and release doing: runTheMatrix.py -l 10801.0 -i all --ibeos at least after 10 events (the one you quoted crashed at event 4th) so I understand it depends on the event in a random way
On the other hand, the error seems related to these lines: https://github.com/cms-sw/cmssw/blob/master/DataFormats/NanoAOD/interface/FlatTable.h#L94-L101 But I fail to see why. I wonder if the developer @gpetruc could shred some light
The IBs run with 4 threads, and what we see is all 4 threads failing on the first event for that thread. With all 4 threads failing, it's likely some kind of initialization failure, possibly a multi-thread race condition.
Thanks @dan131riley I have just run with 4 Threads but no error... :-( runTheMatrix.py -l 10801.0 -i all --ibeos -t 4 It looks very event dependent..
More likely timing dependent. It's an all or none failure--either all the streams fail or none do, that's not consistent with an event dependent failure. Thread races can be very dependent on the system load, and the IB machines tend to be heavily loaded.
Ok, I understand, but that makes even harder to reproduce...
Probably related, in https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/slc7_amd64_gcc900/CMSSW_12_2_DEVEL_X_2021-11-04-2300/pyRelValMatrixLogs/run/10004.0_SingleGammaPt10+2017+SingleGammaPt10_pythia8_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step6_SingleGammaPt10+2017+SingleGammaPt10_pythia8_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log
----- Begin Fatal Exception 05-Nov-2021 07:23:16 CET-----------------------
An exception of category 'StdException' occurred while
[0] Processing Event run: 1 lumi: 1 event: 6 stream: 2
[1] Running path 'dqmoffline_step'
[2] Prefetching for module NanoAODDQM/'nanoDQMMC'
[3] Calling method for module SimpleGenEventFlatTableProducer/'genTable'
Exception Message:
A std::exception was thrown.
no method or data member named "hasBinningValues" found for type "GenEventInfoProduct"
----- End Fatal Exception -------------------------------------------------
Thanks @dan131riley That is stranger since the method (even if it is not a DQM class) exists: https://github.com/cms-sw/cmssw/blob/6d2f66057131baacc2fcbdd203588c41c885b42c/SimDataFormats/GeneratorProducts/interface/GenEventInfoProduct.h#L49 So, I do not understand
+1 I am still not able to reproduce in CMSSW_12_3_ROOT624_X_2021-12-10-2300 If you think this issue is still alive please let me know Thanks
Occurred in CMSSW_12_3_X_2021-12-13-2300 slc7_ppc64le_gcc11
----- Begin Fatal Exception 14-Dec-2021 10:15:47 CET-----------------------
An exception of category 'Configuration' occurred while
[0] Processing Event run: 1 lumi: 2 event: 103 stream: 0
[1] Running path 'dqmoffline_3_step'
[2] Calling method for module NanoAODDQM/'nanoDQMMC'
Exception Message:
Cut parser error:no method or data member named "getAnyValue" found for type "nanoaod::FlatTable::RowView" (char 4)
----- End Fatal Exception -------------------------------------------------
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_ppc64le_gcc11/CMSSW_12_3_X_2021-12-13-2300/pyRelValMatrixLogs/run/11834.0_TTbar_14TeV+2021PU+TTbar_14TeV_TuneCP5_GenSimINPUT+DigiPU+RecoNanoPU+HARVESTNanoPU/step3_TTbar_14TeV+2021PU+TTbar_14TeV_TuneCP5_GenSimINPUT+DigiPU+RecoNanoPU+HARVESTNanoPU.log#/423-423
Thanks, I am still not able to reproduce in that last example, either in single-thread or in multi-thread..... and without reproducing I cannot debug...
The only thing I know but which I don't understand, is that the crash is coming from: https://github.com/cms-sw/cmssw/blob/master/DataFormats/NanoAOD/interface/FlatTable.h#L94-L101
@gpetruc @peruzzim could you give any clue? Thanks
-1
I made https://github.com/cms-sw/cmssw/pull/36501 to add more information to the exception message for the next time it occurs.
Occurred in CMSSW_12_3_X_2021-12-24-2300 slc7_ppc64le_gcc11
Rivet.Analysis.HiggsTemplateCrossSections: WARN Unkown Higgs production mechanism. Cannot classify event. Classification for all events will most likely fail.
----- Begin Fatal Exception 25-Dec-2021 04:35:24 CET-----------------------
An exception of category 'StdException' occurred while
[0] Processing Event run: 1 lumi: 1 event: 3 stream: 1
[1] Running path 'dqmoffline_step'
[2] Prefetching for module NanoAODDQM/'nanoDQMMC'
[3] Calling method for module SimpleGenEventFlatTableProducer/'genTable'
Exception Message:
A std::exception was thrown.
no method or data member named "hasBinningValues" found for type "GenEventInfoProduct"
It has the following methods
and the following data members
weights_
signalProcessID_
qScale_
alphaQCD_
alphaQED_
pdf_
binningValues_
DJRValues_
nMEPartons_
nMEPartonsFiltered_
----- End Fatal Exception -------------------------------------------------
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_ppc64le_gcc11/CMSSW_12_3_X_2021-12-24-2300/pyRelValMatrixLogs/run/10802.0_SingleElectronPt35+2018+SingleElectronPt35_pythia8_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step6_SingleElectronPt35+2018+SingleElectronPt35_pythia8_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log#/
Occurred in CMSSW_12_3_X_2021-12-27-2300 cs8_ppc64le_gcc11
%MSG
----- Begin Fatal Exception 28-Dec-2021 09:17:49 CET-----------------------
An exception of category 'Configuration' occurred while
[0] Processing Event run: 1 lumi: 47 event: 4606 stream: 2
[1] Running path 'dqmoffline_step'
[2] Calling method for module NanoAODDQM/'nanoDQMMC'
Exception Message:
Cut parser error:no method or data member named "getAnyValue" found for type "nanoaod::FlatTable::RowView"
It has the following methods
and the following data members
table_
row_
(char 0)
Cut string was getAnyValue("pt") > 15 && abs(getAnyValue("dxy")) < 0.2 && abs(getAnyValue("dz")) < 0.5 && getAnyValue("cutBased") >= 3 && getAnyValue("miniPFRelIso_all") < 0.4
----- End Fatal Exception -------------------------------------------------
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/cs8_ppc64le_gcc11/CMSSW_12_3_X_2021-12-27-2300/pyRelValMatrixLogs/run/10071.0_QCD_FlatPt_15_3000HS_13+2017+QCDForPF_13TeV_TuneCUETP8M1_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step6_QCD_FlatPt_15_3000HS_13+2017+QCDForPF_13TeV_TuneCUETP8M1_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log#/
It seems that both example failures are missing all the methods (and hence causing the exception). The printout comes from
https://github.com/cms-sw/cmssw/blob/5cc0e67f56fdf5cf3b0126da9b378733083ae17f/CommonTools/Utils/src/MethodSetter.cc#L127-L133
The loop uses these functions
https://github.com/cms-sw/cmssw/blob/5cc0e67f56fdf5cf3b0126da9b378733083ae17f/FWCore/Reflection/src/TypeWithDict.cc#L880-L889
IterWithDict
is essentially a wrapper over TIter
https://github.com/cms-sw/cmssw/blob/master/FWCore/Reflection/interface/IterWithDict.h
https://github.com/cms-sw/cmssw/blob/master/FWCore/Reflection/src/IterWithDict.cc
Given that the TypeDataMembers
is able to list the data members correctly, I'd be tempted to conclude that type.getClass()
returns a non-nullptr
pointer
https://github.com/cms-sw/cmssw/blob/5cc0e67f56fdf5cf3b0126da9b378733083ae17f/FWCore/Reflection/src/TypeWithDict.cc#L858
https://github.com/cms-sw/cmssw/blob/5cc0e67f56fdf5cf3b0126da9b378733083ae17f/FWCore/Reflection/src/TypeWithDict.cc#L380-L385
Could there be a race condition in TClass
? @pcanal
I think (but did not verify) in both cases the TypeWithDict
is constructed from std::type_info
, in which case the TypeWithDict::class_
is initialized as
https://github.com/cms-sw/cmssw/blob/5cc0e67f56fdf5cf3b0126da9b378733083ae17f/FWCore/Reflection/src/TypeWithDict.cc#L277-L279
Occurred in CMSSW_12_3_GEANT4_X_2021-12-28-2300 https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc10/CMSSW_12_3_GEANT4_X_2021-12-28-2300/pyRelValMatrixLogs/run/10842.0_ZMM_13+2018+ZMM_13TeV_TuneCUETP8M1_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step6_ZMM_13+2018+ZMM_13TeV_TuneCUETP8M1_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log#/
@makortel have you seen any recent occurence of this ?
Could there be a race condition in TClass? @pcanal
That's unlikely nowadays but it is of course possible.
in both cases the TypeWithDict is constructed from std::type_info
Then it could be a case of missing dictionary. (not generated or somehow not loaded)
I don't remember seeing this exception any time recently (but it is possible that I've just forgotten). We have changed ROOT version in between though.