cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

PromptReco failure PromptReco_Run381379_ParkingSingleMuon4

Open Dr15Jones opened this issue 1 year ago • 34 comments

From https://cms-talk.web.cern.ch/t/paused-job-for-promptreco-run381379-parkingsinglemuon4/42082

----- Begin Fatal Exception 06-Jun-2024 16:58:22 CEST-----------------------
An exception of category 'FileReadError' occurred while
   [0] Processing  Event run: 381379 lumi: 819 event: 1742750619 stream: 2
   [1] Running path 'write_AOD_step'
   [2] Prefetching for module PoolOutputModule/'write_AOD'
   [3] While reading from source GlobalObjectMapRecord hltGtStage2ObjectMap '' HLT
   [4] Rethrowing an exception that happened on a different read request.
   [5] Processing  Event run: 381379 lumi: 819 event: 1742683577 stream: 4
   [6] Running path 'dqmoffline_step'
   [7] Prefetching for module DQMMessageLogger/'DQMMessageLogger'
   [8] Prefetching for module LogErrorHarvester/'logErrorHarvester'
   [9] Prefetching for module CSCRecHitDProducer/'csc2DRecHits'
   [10] Prefetching for module CSCDCCUnpacker/'muonCSCDigis'
   [11] While reading from source FEDRawDataCollection rawDataCollector '' LHC
   [12] Reading branch FEDRawDataCollection_rawDataCollector__LHC.
Exception Message:
vector::_M_default_append
----- End Fatal Exception -------------------------------------------------

The tarball can be found here:

/afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/FileReadError/job/WMTaskSpace/cmsRun1 From the logs it seems to crash at event 1742503164. The error is reproducible locally.

Dr15Jones avatar Jun 06 '24 20:06 Dr15Jones

cms-bot internal usage

cmsbuild avatar Jun 06 '24 20:06 cmsbuild

A new Issue was created by @Dr15Jones.

@antoniovilela, @sextonkennedy, @smuzaffar, @makortel, @rappoccio, @Dr15Jones can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Jun 06 '24 20:06 cmsbuild

The job can be run by setting up a CMSSW_14_0_7 area, downloading the tarball (which is at /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/FileReadError/a406cf00-00a4-498e-b7e2-9ec39b964fac-216-3-logArchive.tar.gz )

Then after untarring go to directory job/WMTaskSpace/cmsRun1 and then do

cmsRun PSet.py

Dr15Jones avatar Jun 06 '24 20:06 Dr15Jones

There appear to be lots of extraneous exceptions being thrown (and caught) in this job. The first one encountered is

%MSG-e SiStripMonitorTrack:  SiStripMonitorTrack:HLTSiStripMonitorTrack  06-Jun-2024 17:43:09 CEST Run: 381379 Event: 1741662696
ClusterCollection is not valid!!
%MSG
[Switching to Thread 0x7fffa05fe640 (LWP 3001818)]

Thread 7 "cmsRun" hit Catchpoint 1 (exception thrown), 0x00007ffff5ead0f1 in __cxxabiv1::__cxa_throw (obj=0x7ffde5f68b00, tinfo=0x7ffff79a0650 <typeinfo for edm::Exception>,
    dest=0x7ffff796a010 <edm::Exception::~Exception()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
81      ../../../../libstdc++-v3/libsupc++/eh_throw.cc: No such file or directory.
(gdb) where
#0  0x00007ffff5ead0f1 in __cxxabiv1::__cxa_throw (obj=0x7ffde5f68b00, tinfo=0x7ffff79a0650 <typeinfo for edm::Exception>, dest=0x7ffff796a010 <edm::Exception::~Exception()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1  0x00007ffff7b7e0b2 in throwInvalidRefFromNullOrInvalidRef(edm::TypeID const&) ()
   from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libDataFormatsCommon.so
#2  0x00007ffff7b7ed6f in edm::RefCore::tryToGetProductPtr(std::type_info const&, edm::EDProductGetter const*) const [clone .cold] ()
   from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libDataFormatsCommon.so
#3  0x00007fffa557aa1a in reco::Track::recHitsBegin() const ()
   from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/pluginRecoTrackerFinalTrackSelectorsPlugins.so
#4  0x00007fffa55bd779 in SingleLongTrackProducer::produce(edm::Event&, edm::EventSetup const&) ()
   from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/pluginRecoTrackerFinalTrackSelectorsPlugins.so
#5  0x00007ffff7e483c1 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) ()
   from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libFWCoreFramework.so
#6  0x00007ffff7e2c04e in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) ()
   from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libFWCoreFramework.so
#7  0x00007ffff7db9159 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libFWCoreFramework.so
#8  0x00007ffff7db96c4 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() ()
   from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libFWCoreFramework.so
#9  0x00007ffff7f3af28 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) ()
   from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libFWCoreConcurrency.so
#10 0x00007ffff6f1091b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7ffeafe74400, waiter=..., this=0x7ffff41c3b00)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#11 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7ffff41c3b00)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#12 tbb::detail::r1::arena::process (tls=..., this=<optimized out>)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/arena.cpp:137
#13 tbb::detail::r1::market::process (this=<optimized out>, j=...)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/market.cpp:599
#14 0x00007ffff6f12ace in tbb::detail::r1::rml::private_worker::run (this=0x7ffff2486f00)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#15 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7ffff2486f00)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#16 0x00007ffff5a89c02 in start_thread () from /lib64/libc.so.6
#17 0x00007ffff5b0ec40 in clone3 () from /lib64/libc.so.6

Which is caught here https://github.com/cms-sw/cmssw/blob/dbbd44f6792e61b79f46b7f9974eec7cf8e3024b/RecoTracker/FinalTrackSelectors/plugins/SingleLongTrackProducer.cc#L158-L173

which is problematic as the tracks are the generalTracks which are being made in this job and SHOULD have accessible hits!

Dr15Jones avatar Jun 06 '24 21:06 Dr15Jones

assign tracking

Dr15Jones avatar Jun 06 '24 21:06 Dr15Jones

The next group of exceptions come from

#0  0x00007ffff5b9d2f1 in __cxxabiv1::__cxa_throw (obj=0x7ffdca082400, tinfo=0x7ffff79a5628 <typeinfo for cms::Exception>, dest=0x7ffff796ee30 <cms::Exception::~Exception()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1  0x00007fffc37f8a8d in PerigeeConversions::ftsToPerigeeParameters(FreeTrajectoryState const&, Point3DBase<float, GlobalTag> const&, double&) [clone .cold] ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libTrackingToolsTrajectoryState.so
#2  0x00007fffc3806a5a in TrajectoryStateClosestToPoint::TrajectoryStateClosestToPoint(FreeTrajectoryState const&, Point3DBase<float, GlobalTag> const&) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libTrackingToolsTrajectoryState.so
#3  0x00007fffc38725a5 in TSCPBuilderNoMaterial::operator()(TrajectoryStateOnSurface const&, Point3DBase<float, GlobalTag> const&) const ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libTrackingToolsPatternTools.so
#4  0x00007fffbe679dd2 in PerigeeLinearizedTrackState::computeJacobians() const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexVertexTools.so
#5  0x00007fffbe67a456 in PerigeeLinearizedTrackState::isValid() const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexVertexTools.so
#6  0x00007fffbc5ac58f in KalmanVertexUpdator<5u>::positionUpdate(VertexState const&, ReferenceCountingPointer<LinearizedTrackState<5u> >, float, int) const ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#7  0x00007fffbc5ae20d in KalmanVertexUpdator<5u>::update(CachingVertex<5u> const&, ReferenceCountingPointer<VertexTrack<5u> >, float, int) const ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#8  0x00007fffbc5ae89a in KalmanVertexUpdator<5u>::add(CachingVertex<5u> const&, ReferenceCountingPointer<VertexTrack<5u> >) const ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#9  0x00007fffbc5ae90d in KalmanVertexTrackCompatibilityEstimator<5u>::estimateNFittedTrack(CachingVertex<5u> const&, ReferenceCountingPointer<VertexTrack<5u> >) const ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#10 0x00007fffbc5b023f in KalmanVertexTrackCompatibilityEstimator<5u>::estimate(CachingVertex<5u> const&, ReferenceCountingPointer<VertexTrack<5u> >, unsigned int) const ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#11 0x00007fffbc5aa80e in KalmanVertexTrackCompatibilityEstimator<5u>::estimate(CachingVertex<5u> const&, ReferenceCountingPointer<LinearizedTrackState<5u> >, unsigned int) const ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#12 0x00007fffbc5d101c in AdaptiveVertexFitter::reWeightTracks(std::vector<ReferenceCountingPointer<LinearizedTrackState<5u> >, std::allocator<ReferenceCountingPointer<LinearizedTrackState<5u> > > > const&, CachingVertex<5u> const&) const ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexAdaptiveVertexFit.so
#13 0x00007fffbc5d1e65 in AdaptiveVertexFitter::reWeightTracks(std::vector<ReferenceCountingPointer<VertexTrack<5u> >, std::allocator<ReferenceCountingPointer<VertexTrack<5u> > > > const&, CachingVertex<5u> const&) const ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexAdaptiveVertexFit.so
#14 0x00007fffbc5d32ed in AdaptiveVertexFitter::fit(std::vector<ReferenceCountingPointer<VertexTrack<5u> >, std::allocator<ReferenceCountingPointer<VertexTrack<5u> > > > const&, VertexState const&, bool) const ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexAdaptiveVertexFit.so
#15 0x00007fffbc5d46e1 in AdaptiveVertexFitter::vertex(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, Point3DBase<float, GlobalTag> const&) const ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexAdaptiveVertexFit.so
#16 0x00007fff4035710a in TemplatedInclusiveVertexFinder<edm::View<reco::Candidate>, reco::VertexCompositePtrCandidate>::produce(edm::Event&, edm::EventSetup const&) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginRecoVertexAdaptiveVertexFinderPlugins.so
#17 0x00007ffff7ce1e91 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so

the exception originates here

https://github.com/cms-sw/cmssw/blob/dbbd44f6792e61b79f46b7f9974eec7cf8e3024b/TrackingTools/TrajectoryState/src/PerigeeConversions.cc#L15-L16

and is caught here

https://github.com/cms-sw/cmssw/blob/dbbd44f6792e61b79f46b7f9974eec7cf8e3024b/TrackingTools/TrajectoryState/src/TrajectoryStateClosestToPoint.cc#L8-L23

Dr15Jones avatar Jun 06 '24 21:06 Dr15Jones

assign reconstruction

Dr15Jones avatar Jun 06 '24 21:06 Dr15Jones

New categories assigned: reconstruction

@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Jun 06 '24 21:06 cmsbuild

By skipping the first events, I was able to get to the trackback for the exception which ultimately ended the job

#0  0x00007ffff5b9d2f1 in __cxxabiv1::__cxa_throw (obj=0x7ffe9579d1a0, tinfo=0x7ffff5d03190 <typeinfo for std::length_error>, dest=0x7ffff5bb2220 <std::length_error::~length_error()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1  0x00007ffff5b942d9 in std::__throw_length_error(char const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib64/libstdc++.so.6
#2  0x00007fffc38c8346 in ROOT::Detail::TCollectionProxyInfo::Pushback<std::vector<unsigned char, std::allocator<unsigned char> > >::resize(void*, unsigned long) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libDataFormatsStdDictionaries.so
#3  0x00007ffff7193701 in void TGenCollectionStreamer::ReadBufferVectorPrimitives<unsigned char>(TBuffer&, void*, TClass const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#4  0x00007ffff7110e09 in TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#5  0x00007ffff735e073 in int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#6  0x00007ffff7211e4c in TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#7  0x00007ffff710f5fc in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#8  0x00007ffff725f38f in int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#9  0x00007ffff7117eae in TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#10 0x00007ffff735cdcc in int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#11 0x00007ffff71de94d in TStreamerInfoActions::GenericReadAction(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#12 0x00007ffff710fbb5 in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#13 0x00007ffff7873b87 in TBranchElement::ReadLeavesMember(TBuffer&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libTree.so
#14 0x00007ffff786c429 in TBranch::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libTree.so
#15 0x00007ffff787ed44 in TBranchElement::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libTree.so
#16 0x00007ffff787ecfd in TBranchElement::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libTree.so
#17 0x00007fff9d66585c in edm::RootTree::getEntry(TBranch*, long long) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginIOPoolInput.so
#18 0x00007fff9d64639c in edm::RootDelayedReader::getProduct_(edm::BranchID const&, edm::EDProductGetter const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginIOPoolInput.so
#19 0x00007ffff7bc111f in edm::DelayedReader::getProduct(edm::BranchID const&, edm::EDProductGetter const*, edm::ModuleCallingContext const*) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#20 0x00007ffff7c6a35b in edm::DelayedReaderInputProductResolver::prefetchAsync_(edm::WaitingTaskHolder, edm::Principal const&, bool, edm::ServiceToken const&, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#21 0x00007ffff7c6b7cc in edm::DelayedReaderInputProductResolver::prefetchAsync_(edm::WaitingTaskHolder, edm::Principal const&, bool, edm::ServiceToken const&, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const::{lambda()#1}::operator()() const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#22 0x00007ffff7c6b918 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::DelayedReaderInputProductResolver::prefetchAsync_(edm::WaitingTaskHolder, edm::Principal const&, bool, edm::ServiceToken const&, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::DelayedReaderInputProductResolver::prefetchAsync_(edm::WaitingTaskHolder, edm::Principal const&, bool, edm::ServiceToken const&, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#23 0x00007ffff7e031d0 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#24 0x00007ffff63fe95b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7fff08c3ec00, waiter=..., this=0x7ffff3963b00)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#25 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7ffff3963b00)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#26 tbb::detail::r1::arena::process (tls=..., this=<optimized out>)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/arena.cpp:137
#27 tbb::detail::r1::market::process (this=<optimized out>, j=...)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/market.cpp:599
#28 0x00007ffff6400b0e in tbb::detail::r1::rml::private_worker::run (this=0x7ffff17e9100)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#29 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7ffff17e9100)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#30 0x00007ffff55341ca in start_thread () from /lib64/libpthread.so.0
#31 0x00007ffff518f8d3 in clone () from /lib64/libc.so.6

Dr15Jones avatar Jun 06 '24 22:06 Dr15Jones

assign root

Dr15Jones avatar Jun 06 '24 22:06 Dr15Jones

@pcanal how can we understand better what happened during the read?

Dr15Jones avatar Jun 06 '24 22:06 Dr15Jones

type root

Dr15Jones avatar Jun 06 '24 22:06 Dr15Jones

type tracking

Dr15Jones avatar Jun 06 '24 22:06 Dr15Jones

Which is caught here

https://github.com/cms-sw/cmssw/blob/dbbd44f6792e61b79f46b7f9974eec7cf8e3024b/RecoTracker/FinalTrackSelectors/plugins/SingleLongTrackProducer.cc#L158-L173

that's just looks like a poorly written code, where try/catch is used instead of checking for trackExtra to be present. Tracks are apparently not pure generalTracks, see https://github.com/cms-sw/cmssw/blob/dbbd44f6792e61b79f46b7f9974eec7cf8e3024b/RecoTracker/FinalTrackSelectors/plugins/SingleLongTrackProducer.cc#L133-L136

a proper copy is made conditionally, while the rest in selTracks is going to be default-constructed reco::Tracks

slava77 avatar Jun 06 '24 23:06 slava77

@borzari please check https://github.com/cms-sw/cmssw/issues/45162#issuecomment-2153549462 to possibly remove the try/catch pattern related to just acces to track.extra in the track.recHitsBegin() call. It should be a combination of validity checks for extra() and then extra()->recHitsProduct(); by checking isNonnull() && isAvailable() for each, sequentially. This could even be packed into a new helper method ,e.g. bool reco::Track::recHitsOk()

Please clarify if you are available to check this. Thank you.

slava77 avatar Jun 07 '24 12:06 slava77

Hi @slava77

I applied what you suggested in this commit, used the opportunity to remove some duplicated code, and tested it with RelValZMM and RelValTTbar events by comparing the version with try/catch results with the version with the validity check results. Everything worked as intended and no changes to the output were observed, as expected.

Just to clarify two points:

  • I added a method inside the SingleLongTrackProducer module to check the validity of the track. Thinking out loud about what you suggested, I think you meant that the method could be included in https://github.com/cms-sw/cmssw/blob/master/DataFormats/TrackReco/interface/Track.h. If this is what you meant, I can modify the branch to have the recHitsOk method there;
  • I couldn't check the validity of the recHitsProduct(). There doesn't seem to be something similar to isNonnull() or isAvailable() for it. However, just checking track.extra() seemed enough. Was it supposed to be like this? Am I missing something about the recHitsProduct()?

borzari avatar Jun 08 '24 04:06 borzari

I couldn't check the validity of the recHitsProduct(). There doesn't seem to be something similar to isNonnull() or isAvailable() for it. However, just checking track.extra() seemed enough. Was it supposed to be like this? Am I missing something about the recHitsProduct()?

I misread the TrackExtraBase; edm::RefCore m_hitCollection; is the one that has isNonnull() and isAvailable(), but it is not publicly exposed.

So, I would add this bool recHitsOk() const {return m_hitCollection.isNonnull() && m_hitCollection.isAvailable();} in TrackExtraBase.h And then in Track.h bool recHitsOk() const {return extra_.isNonnull() && extra_.isAvailable() && extra_->recHitsOk();}

Even though in the current setup a track without an extra is enough, there can still be cases where SingleLongTrackProducer uses input tracks where hits got dropped.

slava77 avatar Jun 08 '24 04:06 slava77

Tracks are apparently not pure generalTracks, see https://github.com/cms-sw/cmssw/blob/dbbd44f6792e61b79f46b7f9974eec7cf8e3024b/RecoTracker/FinalTrackSelectors/plugins/SingleLongTrackProducer.cc#L133-L136 a proper copy is made conditionally, while the rest in selTracks is going to be default-constructed reco::Tracks

Out of curiosity why is that? Can't the selTracks just contain the tracks we can actually refit?

mmusich avatar Jun 08 '24 16:06 mmusich

Tracks are apparently not pure generalTracks, see https://github.com/cms-sw/cmssw/blob/dbbd44f6792e61b79f46b7f9974eec7cf8e3024b/RecoTracker/FinalTrackSelectors/plugins/SingleLongTrackProducer.cc#L133-L136

a proper copy is made conditionally, while the rest in selTracks is going to be default-constructed reco::Tracks

Out of curiosity why is that? Can't the selTracks just contain the tracks we can actually refit?

Hi @mmusich The selTracks collection will only have one track, the one with smallest chiNdof. I also want to check if the rechits and hits from the hitpattern are valid to say that it is a goodTrack that can be used for the shortened tracks pT resolution. Specially because of what @slava77 mentioned here:

Even though in the current setup a track without an extra is enough, there can still be cases where SingleLongTrackProducer uses input tracks where hits got dropped.

The hit checks are to make sure that this track won't have missing layers with measurement, which is not 100% effective as I already showed during the presentations about this topic, but also doesn't impact a lot on the final result because it doesn't happen so often. I wouldn't think changing that part of the code for selTracks to only have tracks that can be refitted to have a large impact on what is going on in the SingleLongTrackProducer or after it, unless it is an extra "safety check" that can be included.

Here I added the suggestions from @slava77. Again, I tested with RelValZMM and RelValTTbar events, and things are working as expected. If you don't have other suggestions, I can open a PR with it and we can continue the discussion there

borzari avatar Jun 08 '24 17:06 borzari

@borzari

also want to check if the rechits and hits from the hitpattern are valid to say that it is a goodTrack that can be used for the shortened tracks pT resolution.

Exactly, can't you do that before filling the vector? Default constructed tracks can't be used for refit.

mmusich avatar Jun 08 '24 17:06 mmusich

@borzari

also want to check if the rechits and hits from the hitpattern are valid to say that it is a goodTrack that can be used for the shortened tracks pT resolution.

Exactly, can't you do that before filling the vector? Default constructed tracks can't be used for refit.

Alright, so instead of only getting the track with the smallest chiNdof, I also want it to have recHitsOk(), right?

borzari avatar Jun 08 '24 17:06 borzari

I also want it to have recHitsOk(), right?

Right, this is what I had in mind.

mmusich avatar Jun 08 '24 17:06 mmusich

I also want it to have recHitsOk(), right?

Right, this is what I had in mind.

It didn't work. If I move the validity check from the rechits/hitpattern check to where I select tracks (I did if (chiNdof < fitProb && track.recHitsOk())), I get the message as if I was not checking the tracks:

----- Begin Fatal Exception 08-Jun-2024 19:16:37 CEST-----------------------
An exception of category 'InvalidReference' occurred while
   [0] Processing  Event run: 1 lumi: 76 event: 7503 stream: 6
   [1] Running path 'dqmoffline_step'
   [2] Calling method for module SingleLongTrackProducer/'SingleLongTrackProducer'
Exception Message:
BadRefCore RefCore: Request to resolve a null or invalid reference to a product of type 'std::vector<reco::TrackExtra>' has been detected.
Please modify the calling code to test validity before dereferencing.
----- End Fatal Exception -------------------------------------------------

borzari avatar Jun 08 '24 17:06 borzari

I get the message as if I was not checking the tracks:

Isn't track.recHitsOk() checking that the TrackExtra is valid?

mmusich avatar Jun 08 '24 17:06 mmusich

Isn't track.recHitsOk() checking that the TrackExtra is valid?

Should be. I implemented it like Slava suggested here

Could it be that, although I am adding only tracks with valid TrackExtra to selTracks, the framework still needs me to check if I am looking at a valid track (that have TrackExtra) from it to check if it has valid hits/hitpattern? I am not sure how the "not valid TrackExtra" exception works, that is why I am asking

borzari avatar Jun 08 '24 17:06 borzari

The check I used was

if (track.extra().isAvailable()) {

Dr15Jones avatar Jun 08 '24 19:06 Dr15Jones

The check I used was

if (track.extra().isAvailable()) {

Alright @Dr15Jones, but does it happens every time I am using a reco::Track anywhere?

Well, in any case, I would suggest to open a PR with these changes. At least to remove the try/catch pattern.

borzari avatar Jun 10 '24 02:06 borzari

get the message as if I was not checking the tracks:

maybe I am missing something, but with https://github.com/CMSTrackingPOG/cmssw/commit/53185493eae82d7fe8e807e9b266491ea51d06f8 on top of https://github.com/borzari/cmssw/commit/95ecc4bb4aa7e811f1f65025c8f08a23a72cf272 I can run this test:

https://github.com/cms-sw/cmssw/blob/4639c105a21c6934798c543c9a7cae72955c9369/DQM/TrackingMonitorSource/test/BuildFile.xml#L2

(even using the whole input file) without crashes.

mmusich avatar Jun 11 '24 04:06 mmusich

@mmusich most probably I was missing something. The main differences I see (besides the better organization of the code in the way you wrote), is that I included track.recHitsOk() here in the condition to select the best track, and instead of using isNonnull() here, I would use the bestTrack.recHitsOk(). Also, and maybe here was my mistake, I removed this condition, which you didn't. That is why I asked @Dr15Jones if the check for the availability for TrackExtra is done every time a reco::Track is being used

borzari avatar Jun 11 '24 13:06 borzari

@mmusich I started from your branch and tested what I mentioned above:

  • Replaced if (bestTrack.extra().isNonnull()) with if (bestTrack.recHitsOk()): didn't have any effect, as expected, and should be "safer"
  • Removed the extra check from here, and it also didn't failed, as I was thinking. I really don't know why that is the case and what is different from what I did, except for adding the check together with the chi2ndof condition to fill selTracks; I would also keep the extra check for safety reasons

May I start a PR to include your changes and the recHitsOk() method to CMSSW?

borzari avatar Jun 12 '24 19:06 borzari