cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Crash in WF 34034.0 HGCalCLUEAlgoT::findAndAssignClusters()

Open dan131riley opened this issue 7 months ago • 19 comments

I don't think we have an issue for this one, though it was mentioned in #47959. Stack trace::

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Sun May 11 06:36:07 CEST 2025
Thread 7 (Thread 0x1545e224e700 (LWP 2966717) "cmsRun"):
#2  0x000015462f3a30c0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000154634d18399 in std::basic_ios<char, std::char_traits<char> >::init (this=0x1545e22485b0, __sb=0x1545e2248548) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/bits/basic_ios.tcc:126
#5  0x00001546091ee5eb in HGCalDDDConstants::locateCell(int, int, int, int, int, int, bool, bool, bool, bool, bool) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/libGeometryHGCalCommonData.so
#6  0x00001546144637f2 in HGCalGeometry::getPosition(DetId const&, bool, bool) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/libGeometryHGCalGeometry.so
#7  0x000015461446431f in HGCalGeometry::getPosition(DetId const&, bool) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/libGeometryHGCalGeometry.so
#8  0x00001545d007206d in hgcal::RecHitTools::getPosition(DetId const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/libRecoLocalCaloHGCalRecAlgos.so
#9  0x00001545ce98bc53 in HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::populate(std::vector<HGCRecHit, std::allocator<HGCRecHit> > const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginRecoLocalCaloHGCalRecProducersPlugins.so
#10 0x00001545ce9b20bd in HGCalLayerClusterProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginRecoLocalCaloHGCalRecProducersPlugins.so
#11 0x0000154635c4f0f5 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 6 (Thread 0x1545e184d700 (LWP 2966718) "cmsRun"):
#2  0x000015462f3a30c0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000015462d16dda8 in CLHEP::RandGaussQ::transformQuick (r=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/clhep/2.4.7.1-d3a3e353d370e701238f7949a0d7909f/clhep-2.4.7.1/Random/src/RandGaussQ.cc:122
#5  0x000015457061b97b in HGCDigitizerBase::GenerateGaussianNoise(CLHEP::HepRandomEngine*, double, double) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginSimCalorimetryHGCalSimProducersPlugins.so
#6  0x000015457062086d in HGCDigitizerBase::run(std::unique_ptr<edm::SortedCollection<HGCDataFrame<DetId, HGCSample>, edm::StrictWeakOrdering<HGCDataFrame<DetId, HGCSample> > >, std::default_delete<edm::SortedCollection<HGCDataFrame<DetId, HGCSample>, edm::StrictWeakOrdering<HGCDataFrame<DetId, HGCSample> > > > >&, std::unordered_map<unsigned int, hgc_digi::HGCCellInfo, std::hash<unsigned int>, std::equal_to<unsigned int>, std::allocator<std::pair<unsigned int const, hgc_digi::HGCCellInfo> > >&, CaloSubdetectorGeometry const*, std::unordered_set<DetId, std::hash<DetId>, std::equal_to<DetId>, std::allocator<DetId> > const&, unsigned int, CLHEP::HepRandomEngine*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginSimCalorimetryHGCalSimProducersPlugins.so
#7  0x0000154570611e93 in HGCDigitizer::finalizeEvent(edm::Event&, edm::EventSetup const&, CLHEP::HepRandomEngine*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginSimCalorimetryHGCalSimProducersPlugins.so
#8  0x00001545706135c4 in HGCDigiProducer::finalizeEvent(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginSimCalorimetryHGCalSimProducersPlugins.so
#9  0x00001545708c776b in edm::MixingModule::finalizeEvent(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginSimGeneralMixingModulePlugins.so
#10 0x000015457082c341 in edm::BMixingModule::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/libMixingBase.so
#11 0x0000154635c4f0f5 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 5 (Thread 0x1545e05ff700 (LWP 2966719) "cmsRun"):
#2  0x000015462f3a30c0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00001544be5c7183 in HGCalTriggerGeometryV9Imp3::getModuleFromTriggerCell(unsigned int) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginL1TriggerL1THGCalPlugins_geometries.so
#5  0x00001545711bac37 in HGCalVFEProcessorSums::run(edm::SortedCollection<HGCDataFrame<DetId, HGCSample>, edm::StrictWeakOrdering<HGCDataFrame<DetId, HGCSample> > > const&, BXVector<l1t::HGCalTriggerCell>&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginL1TriggerL1THGCalPlugins_fe_be.so
#6  0x0000154603e7277b in HGCalVFEProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginL1TriggerL1THGCalPlugins.so
#7  0x0000154635c4f0f5 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 1 (Thread 0x1546352d2580 (LWP 2966608) "cmsRun"):
#2  0x000015462f3a6494 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00001545ce9989ad in int& std::vector<int, std::allocator<int> >::emplace_back<int>(int&&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginRecoLocalCaloHGCalRecProducersPlugins.so
#5  0x00001545ce98d552 in HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::findAndAssignClusters(unsigned int, float) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginRecoLocalCaloHGCalRecProducersPlugins.so
#6  0x00001545ce99eec8 in HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}::operator()(unsigned long) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginRecoLocalCaloHGCalRecProducersPlugins.so
#7  0x00001545ce99f413 in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned long>, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginRecoLocalCaloHGCalRecProducersPlugins.so
#8  0x0000154635e8d87b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x1544e169d200, this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79d17462af37c53a90ea8933b10c6e89/tbb-v2022.0.0/src/tbb/task_dispatcher.h:334
#9  tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79d17462af37c53a90ea8933b10c6e89/tbb-v2022.0.0/src/tbb/task_dispatcher.h:470
#10 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79d17462af37c53a90ea8933b10c6e89/tbb-v2022.0.0/src/tbb/task_dispatcher.cpp:168
#11 0x00001545ce99e92c in tbb::detail::d1::task_arena_function<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}, void>::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginRecoLocalCaloHGCalRecProducersPlugins.so
#12 0x0000154635e7c9de in operator() (__closure=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79d17462af37c53a90ea8933b10c6e89/tbb-v2022.0.0/src/tbb/arena.cpp:890
#13 tbb::detail::d0::try_call_proxy<tbb::detail::r1::isolate_within_arena(tbb::detail::d1::delegate_base&, intptr_t)::<lambda()> >::on_completion<tbb::detail::r1::isolate_within_arena(tbb::detail::d1::delegate_base&, intptr_t)::<lambda()> > (on_completion_body=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79d17462af37c53a90ea8933b10c6e89/tbb-v2022.0.0/src/tbb/../../include/oneapi/tbb/detail/_template_helpers.h:230
#14 tbb::detail::r1::isolate_within_arena (d=..., isolation=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79d17462af37c53a90ea8933b10c6e89/tbb-v2022.0.0/src/tbb/arena.cpp:891
#15 0x00001545ce989232 in HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginRecoLocalCaloHGCalRecProducersPlugins.so
#16 0x00001545ce9b2111 in HGCalLayerClusterProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/pluginRecoLocalCaloHGCalRecProducersPlugins.so
#17 0x0000154635c4f0f5 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-11-0000/lib/el8_amd64_gcc12/libFWCoreFramework.so

Current Modules:

Module: HGCalLayerClusterProducer:hltHgcalLayerClustersHSi (crashed)
Module: HGCalLayerClusterProducer:hltHgcalLayerClustersEE
Module: MixingModule:mix
Module: HGCalVFEProducer:l1tHGCalVFEProducer

A fatal system signal has occurred: segmentation violation
timeout: the monitored command dumped core

dan131riley avatar May 12 '25 12:05 dan131riley

cms-bot internal usage

cmsbuild avatar May 12 '25 12:05 cmsbuild

A new Issue was created by @dan131riley.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar May 12 '25 12:05 cmsbuild

assign geometry, upgrade

makortel avatar May 12 '25 13:05 makortel

FYI @cms-sw/hgcal-dpg-l2

makortel avatar May 12 '25 13:05 makortel

New categories assigned: geometry,upgrade

@bsunanda,@civanch,@Dr15Jones,@kpedro88,@makortel,@mdhildreth,@Moanwar,@srimanob,@subirsarkar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar May 12 '25 13:05 cmsbuild

I ran this with the package compiled with debugging settings and get

#2  0x00007f2012d31144 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-28-1100/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007f1fb2a1d6ae in std::construct_at<int, int const&> (__location=0x138c5095d4) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_construct.h:97
#5  0x00007f1fb2a1aecc in std::allocator_traits<std::allocator<int> >::construct<int, int const&> (__a=..., __p=0x138c5095d4) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/alloc_traits.h:518
#6  0x00007f1fb2a18e10 in std::vector<int, std::allocator<int> >::emplace_back<int> (this=0x7f1e636ff8a8) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/vector.tcc:117
#7  0x00007f1fb2a15a96 in std::vector<int, std::allocator<int> >::push_back (this=0x7f1e636ff8a8, __x=@0x7f1fc49acd7c: 86) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:1294
#8  0x00007f1fb2a10214 in HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::findAndAssignClusters (this=0x7f1fb6149400, layerId=26, delta=1.29999995) at src/RecoLocalCalo/HGCalRecProducers/plugins/HGCalCLUEAlgo.cc:435
#9  0x00007f1fb2a140c4 in HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}::operator()(unsigned long) const (__closure=0x7f1fc49f9748, i=26) at src/RecoLocalCalo/HGCalRecProducers/plugins/HGCalCLUEAlgo.cc:127
#10 0x00007f1fb2a28027 in std::__invoke_impl<void, HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1} const&, unsigned long&>(std::__invoke_other, HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1} const&, unsigned long&) (__f=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/invoke.h:61
#11 0x00007f1fb2a27f64 in std::__invoke<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1} const&, unsigned long&>(HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1} const&, unsigned long&) (__fn=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/invoke.h:96
#12 0x00007f1fb2a27eaa in std::invoke<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1} const&, unsigned long&>(HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1} const&, unsigned long&) (__fn=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/functional:110
#13 0x00007f1fb2a27df0 in tbb::detail::d0::invoke<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1} const&, unsigned long&>(HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1} const&, unsigned long&) (f=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/detail/_utils.h:356
#14 0x00007f1fb2a27cf8 in tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long>::operator()(tbb::detail::d1::blocked_range<unsigned long> const&) const (this=0x7f1ec10c6a58, r=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/parallel_for.h:206
#15 0x00007f1fb2a27b51 in std::__invoke_impl<void, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> const&, tbb::detail::d1::blocked_range<unsigned long>&>(std::__invoke_other, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> const&, tbb::detail::d1::blocked_range<unsigned long>&) (__f=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/invoke.h:61
#16 0x00007f1fb2a27a72 in std::__invoke<tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> const&, tbb::detail::d1::blocked_range<unsigned long>&>(tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> const&, tbb::detail::d1::blocked_range<unsigned long>&) (__fn=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/invoke.h:96
#17 0x00007f1fb2a27884 in std::invoke<tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> const&, tbb::detail::d1::blocked_range<unsigned long>&>(tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> const&, tbb::detail::d1::blocked_range<unsigned long>&) (__fn=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/functional:110
#18 0x00007f1fb2a2732d in tbb::detail::d0::invoke<tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> const&, tbb::detail::d1::blocked_range<unsigned long>&>(tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> const&, tbb::detail::d1::blocked_range<unsigned long>&) (f=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/detail/_utils.h:356
#19 0x00007f1fb2a26c43 in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned long>, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long>, tbb::detail::d1::auto_partitioner const>::run_body(tbb::detail::d1::blocked_range<unsigned long>&) (this=0x7f1ec10c6a00, r=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/parallel_for.h:117
#20 0x00007f1fb2a25b4a in tbb::detail::d1::dynamic_grainsize_mode<tbb::detail::d1::adaptive_mode<tbb::detail::d1::auto_partition_type> >::work_balance<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned long>, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long>, tbb::detail::d1::auto_partitioner const>, tbb::detail::d1::blocked_range<unsigned long> >(tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned long>, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long>, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d1::blocked_range<unsigned long>&, tbb::detail::d1::execution_data&) (this=0x7f1ec10c6a78, start=..., range=..., ed=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/partitioner.h:450
#21 0x00007f1fb2a2479c in tbb::detail::d1::partition_type_base<tbb::detail::d1::auto_partition_type>::execute<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned long>, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long>, tbb::detail::d1::auto_partitioner const>, tbb::detail::d1::blocked_range<unsigned long> >(tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned long>, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long>, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d1::blocked_range<unsigned long>&, tbb::detail::d1::execution_data&) (this=0x7f1ec10c6a78, start=..., range=..., ed=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/partitioner.h:289
#22 0x00007f1fb2a23688 in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned long>, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) (this=0x7f1ec10c6a00, ed=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/parallel_for.h:170
#23 0x00007f201b18187b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x7f1ec10c6a00, this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/tbb-v2022.0.0/src/tbb/task_dispatcher.h:334
#24 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/tbb-v2022.0.0/src/tbb/task_dispatcher.h:470
#25 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/tbb-v2022.0.0/src/tbb/task_dispatcher.cpp:168
#26 0x00007f1fb2a0c3d1 in tbb::detail::d1::execute_and_wait (t=..., t_ctx=..., wait_ctx=..., w_ctx=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/detail/_task.h:260
#27 0x00007f1fb2a1f851 in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned long>, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long>, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<unsigned long> const&, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> const&, tbb::detail::d1::auto_partitioner const&, tbb::detail::d1::task_group_context&) (range=..., body=..., partitioner=..., context=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/parallel_for.h:112
#28 0x00007f1fb2a1e357 in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned long>, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long>, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<unsigned long> const&, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> const&, tbb::detail::d1::auto_partitioner const&) (range=..., body=..., partitioner=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/parallel_for.h:101
#29 0x00007f1fb2a1bf81 in tbb::detail::d1::parallel_for<tbb::detail::d1::blocked_range<unsigned long>, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> >(tbb::detail::d1::blocked_range<unsigned long> const&, tbb::detail::d1::parallel_for_body_wrapper<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, unsigned long> const&, tbb::detail::d1::auto_partitioner const&) (range=..., body=..., partitioner=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/parallel_for.h:245
#30 0x00007f1fb2a1844e in tbb::detail::d1::parallel_for_impl<unsigned long, HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}, tbb::detail::d1::auto_partitioner const>(unsigned long, unsigned long, unsigned long, HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1} const&, tbb::detail::d1::auto_partitioner const&) (first=0, last=96, step=1, f=..., partitioner=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/parallel_for.h:314
#31 0x00007f1fb2a15780 in tbb::detail::d1::parallel_for<unsigned long, HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1}>(unsigned long, unsigned long, HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const::{lambda(unsigned long)#1} const&) (first=0, last=96, f=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/parallel_for.h:353
#32 0x00007f1fb2a13e94 in HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}::operator()() const (__closure=0x7f1fc49f9828) at src/RecoLocalCalo/HGCalRecProducers/plugins/HGCalCLUEAlgo.cc:101
#33 0x00007f1fb2a24364 in tbb::detail::d1::task_arena_function<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}, void>::operator()() const (this=0x7f1fc49f97c0) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/task_arena.h:68
#34 0x00007f201b1709de in operator() (__closure=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/tbb-v2022.0.0/src/tbb/arena.cpp:890
#35 tbb::detail::d0::try_call_proxy<tbb::detail::r1::isolate_within_arena(tbb::detail::d1::delegate_base&, intptr_t)::<lambda()> >::on_completion<tbb::detail::r1::isolate_within_arena(tbb::detail::d1::delegate_base&, intptr_t)::<lambda()> > (on_completion_body=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/tbb-v2022.0.0/src/tbb/../../include/oneapi/tbb/detail/_template_helpers.h:230
#36 tbb::detail::r1::isolate_within_arena (d=..., isolation=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/tbb-v2022.0.0/src/tbb/arena.cpp:891
#37 0x00007f1fb2a184ae in tbb::detail::d1::isolate_impl<void, HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}>(HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}&) (f=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/task_arena.h:204
#38 0x00007f1fb2a1579b in tbb::detail::d1::isolate<HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}>(HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters()::{lambda()#1}&&) (f=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02891/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include/oneapi/tbb/task_arena.h:446
#39 0x00007f1fb2a0d548 in HGCalCLUEAlgoT<HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>, HGCalSiliconStrategy>::makeClusters (this=0x7f1fb6149400) at src/RecoLocalCalo/HGCalRecProducers/plugins/HGCalCLUEAlgo.cc:100
#40 0x00007f1fb2a4485f in HGCalLayerClusterProducer::produce (this=0x7f1fb614f000, evt=..., es=...) at src/RecoLocalCalo/HGCalRecProducers/plugins/HGCalLayerClusterProducer.cc:260

Dr15Jones avatar May 28 '25 18:05 Dr15Jones

I added an assert just before where the program is seg faulting and the assert failed

cmsRun: src/RecoLocalCalo/HGCalRecProducers/plugins/HGCalCLUEAlgo.cc:435: int HGCalCLUEAlgoT<TILE, STRATEGY>::findAndAssignClusters(unsigned int, float) [with TILE = HGCalLayerTilesT<HGCalSiliconTilesConstants, NoPhiWrapper>; STRATEGY = HGCalS
iliconStrategy]: Assertion `cellsOnLayer.nearestHigher[i] != -1' failed.

So the problem is https://github.com/cms-sw/cmssw/blob/87d00d7e5b75608bdd99b0dbd1efb8e7bb8455bd/RecoLocalCalo/HGCalRecProducers/plugins/HGCalCLUEAlgo.cc#L435

there is no nearest higher cell for this case (as -1 is used in this code when initializing that array and when resetting it). https://github.com/cms-sw/cmssw/blob/87d00d7e5b75608bdd99b0dbd1efb8e7bb8455bd/RecoLocalCalo/HGCalRecProducers/plugins/HGCalCLUEAlgo.cc#L87 https://github.com/cms-sw/cmssw/blob/87d00d7e5b75608bdd99b0dbd1efb8e7bb8455bd/RecoLocalCalo/HGCalRecProducers/plugins/HGCalCLUEAlgo.cc#L400

Dr15Jones avatar May 28 '25 19:05 Dr15Jones

So running in the debugger, I wanted to check the condition which leads to the failing line

https://github.com/cms-sw/cmssw/blob/87d00d7e5b75608bdd99b0dbd1efb8e7bb8455bd/RecoLocalCalo/HGCalRecProducers/plugins/HGCalCLUEAlgo.cc#L434-L436

which is defined here

https://github.com/cms-sw/cmssw/blob/87d00d7e5b75608bdd99b0dbd1efb8e7bb8455bd/RecoLocalCalo/HGCalRecProducers/plugins/HGCalCLUEAlgo.cc#L427

looking at the values in that conditional in the debugger turned up

(gdb) print cellsOnLayer.rho[i]
$9 = (__gnu_cxx::__alloc_traits<std::allocator<float>, float>::value_type &) @0x7ffe4f54d428: -nan(0x400000)

Dr15Jones avatar May 28 '25 20:05 Dr15Jones

So rho[] is always filled from values obtained from cells_[].weight which comes from here

https://github.com/cms-sw/cmssw/blob/87d00d7e5b75608bdd99b0dbd1efb8e7bb8455bd/RecoLocalCalo/HGCalRecProducers/plugins/HGCalCLUEAlgo.cc#L77

I added a check just below that assignment looking for nan and found one

(gdb) print cells_[73].weight[17]
$22 = (__gnu_cxx::__alloc_traits<std::allocator<float>, float>::value_type &) @0x7ffe46168244: -nan(0x400000)

Dr15Jones avatar May 28 '25 21:05 Dr15Jones

Those values come from the hits loaded from the Event

https://github.com/cms-sw/cmssw/blob/87d00d7e5b75608bdd99b0dbd1efb8e7bb8455bd/RecoLocalCalo/HGCalRecProducers/plugins/HGCalLayerClusterProducer.cc#L254-L255

Dr15Jones avatar May 28 '25 21:05 Dr15Jones

Looking at the configuration we see

recHits = cms.InputTag("hltHGCalRecHit","HGCHEFRecHits")

which is configured as

cms.EDProducer("HGCalRecHitProducer",
...
    HGCHEF_cce = cms.PSet(
        refToPSet_ = cms.string('HGCAL_chargeCollectionEfficiencies')
    ),
    HGCHEF_fCPerMIP = cms.vdouble(2.06, 3.43, 5.15),
    HGCHEF_isSiFE = cms.bool(True),
    HGCHEF_keV2DIGI = cms.double(0.044259),
    HGCHEF_noise_fC = cms.PSet(
        refToPSet_ = cms.string('HGCAL_noise_fC')
    ),
    HGCHEFrechitCollection = cms.string('HGCHEFRecHits'),
    HGCHEFuncalibRecHitCollection = cms.InputTag("hltHGCalUncalibRecHit","HGCHEFUncalibRecHits"),
...
    algo = cms.string('HGCalRecHitWorkerSimple'),
...
)
``

Dr15Jones avatar May 28 '25 21:05 Dr15Jones

Looking at HGCalRecHitWorkerSimple and adding an assert in the code where the energy is calculated https://github.com/cms-sw/cmssw/blob/87d00d7e5b75608bdd99b0dbd1efb8e7bb8455bd/RecoLocalCalo/HGCalRecProducers/plugins/HGCalRecHitWorkerSimple.cc#L209-L217

and the debugger says

(gdb) print cce_correction
$2 = 0

The value for cce_correction is set here

https://github.com/cms-sw/cmssw/blob/87d00d7e5b75608bdd99b0dbd1efb8e7bb8455bd/RecoLocalCalo/HGCalRecProducers/plugins/HGCalRecHitWorkerSimple.cc#L187C26-L187C37

with

(gdb) print thickness
$8 = 4

and

(gdb) print hgcHEF_cce_.size()
$7 = 3

So the code is reading off of the end of the container.

Dr15Jones avatar May 28 '25 21:05 Dr15Jones

The container that is the wrong size comes from the configuration

https://github.com/cms-sw/cmssw/blob/87d00d7e5b75608bdd99b0dbd1efb8e7bb8455bd/RecoLocalCalo/HGCalRecProducers/plugins/HGCalRecHitWorkerSimple.cc#L82

and from the earlier comment in the issue, we see

HGCHEF_cce = cms.PSet(
        refToPSet_ = cms.string('HGCAL_chargeCollectionEfficiencies')
    ),

and dumping that from the configuration gives

>>> print(process.HGCAL_chargeCollectionEfficiencies)
cms.PSet(
    values = cms.vdouble(1.0, 1.0, 1.0)
)

which is too small or thickness is too large.

Dr15Jones avatar May 28 '25 21:05 Dr15Jones

The thickness ultimately derives from here

https://github.com/cms-sw/cmssw/blob/87d00d7e5b75608bdd99b0dbd1efb8e7bb8455bd/Geometry/HGCalCommonData/interface/HGCalDDDConstants.h#L262-L265

and from the comment, it looks like a thickness of 4 is not supposed to happen.

Dr15Jones avatar May 28 '25 21:05 Dr15Jones

This problem has the same origin as #47968

Dr15Jones avatar May 28 '25 22:05 Dr15Jones

https://github.com/cms-sw/cmssw/pull/48207 converts this segmentation fault into an assertion failure. Hopefully that will help whom ever is working on fixing the underlying problem of the geometry description being inconsistent.

Dr15Jones avatar May 29 '25 13:05 Dr15Jones

#48207 converts this segmentation fault into an assertion failure

Assertion failure has been observed in IBs:

cmsRun: src/RecoLocalCalo/HGCalRecProducers/plugins/HGCalRecHitWorkerSimple.cc:187: virtual void HGCalRecHitWorkerSimple::run(const edm::Event&, const HGCUncalibratedRecHitCollection&, HGCRecHitCollection&): Assertion `thickness - 1 < static_cast<int>(hgcEE_cce_.size())' failed.


A fatal system signal has occurred: abort signal
The following is the call stack containing the origin of the signal.

Wed Jun  4 13:26:18 CEST 2025
Thread 7 (Thread 0x1477074f9700 (LWP 2616899) "cmsRun"):
#2  0x00001477568c2b33 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_NONLTO_X_2025-06-02-1100/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000014775c07852f in raise () from /lib64/libc.so.6
#5  0x000014775c04be65 in abort () from /lib64/libc.so.6
#6  0x000014775c04bd39 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#7  0x000014775c070e86 in __assert_fail () from /lib64/libc.so.6
#8  0x00001476ea6d84de in HGCalRecHitWorkerSimple::run(edm::Event const&, edm::SortedCollection<HGCUncalibratedRecHit, edm::StrictWeakOrdering<HGCUncalibratedRecHit> > const&, std::vector<HGCRecHit, std::allocator<HGCRecHit> >&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_NONLTO_X_2025-06-04-1100/lib/el8_amd64_gcc12/pluginRecoLocalCaloHGCalRecProducersPlugins.so
#9  0x00001476ea6d20c0 in HGCalRecHitProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_NONLTO_X_2025-06-04-1100/lib/el8_amd64_gcc12/pluginRecoLocalCaloHGCalRecProducersPlugins.so
#10 0x000014775d537dd5 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_NONLTO_X_2025-06-02-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 6 (Thread 0x147706af8700 (LWP 2616900) "cmsRun"):
#2  0x00001477568bbd60 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_NONLTO_X_2025-06-02-1100/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00001476eb0c1b75 in HGCalRawToDigiFake::produce(edm::StreamID, edm::Event&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_NONLTO_X_2025-06-04-1100/lib/el8_amd64_gcc12/pluginEventFilterHGCalRawToDigiAuto.so
#5  0x000014775d51abf0 in edm::global::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_NONLTO_X_2025-06-02-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 5 (Thread 0x147705bff700 (LWP 2616901) "cmsRun"):
#2  0x00001477568bbd60 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_NONLTO_X_2025-06-02-1100/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00001476865dc3eb in HGCDigitizerBase::runSimple(std::unique_ptr<edm::SortedCollection<HGCDataFrame<DetId, HGCSample>, edm::StrictWeakOrdering<HGCDataFrame<DetId, HGCSample> > >, std::default_delete<edm::SortedCollection<HGCDataFrame<DetId, HGCSample>, edm::StrictWeakOrdering<HGCDataFrame<DetId, HGCSample> > > > >&, std::unordered_map<unsigned int, hgc_digi::HGCCellInfo, std::hash<unsigned int>, std::equal_to<unsigned int>, std::allocator<std::pair<unsigned int const, hgc_digi::HGCCellInfo> > >&, CaloSubdetectorGeometry const*, std::unordered_set<DetId, std::hash<DetId>, std::equal_to<DetId>, std::allocator<DetId> > const&, CLHEP::HepRandomEngine*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_NONLTO_X_2025-06-04-1100/lib/el8_amd64_gcc12/pluginSimCalorimetryHGCalSimProducersPlugins.so
#5  0x00001476865d32d9 in HGCDigitizer::finalizeEvent(edm::Event&, edm::EventSetup const&, CLHEP::HepRandomEngine*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_NONLTO_X_2025-06-04-1100/lib/el8_amd64_gcc12/pluginSimCalorimetryHGCalSimProducersPlugins.so
#6  0x00001476865c4e04 in HGCDigiProducer::finalizeEvent(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_NONLTO_X_2025-06-04-1100/lib/el8_amd64_gcc12/pluginSimCalorimetryHGCalSimProducersPlugins.so
#7  0x000014768693b4fb in edm::MixingModule::finalizeEvent(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_NONLTO_X_2025-06-02-1100/lib/el8_amd64_gcc12/pluginSimGeneralMixingModulePlugins.so
#8  0x0000147686872673 in edm::BMixingModule::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_NONLTO_X_2025-06-02-1100/lib/el8_amd64_gcc12/libMixingBase.so
#9  0x000014775d537dd5 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_NONLTO_X_2025-06-02-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 1 (Thread 0x14775d0ce580 (LWP 2616669) "cmsRun"):
#2  0x00001477568bbd60 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_NONLTO_X_2025-06-02-1100/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00001476eb0c1e8e in HGCalRawToDigiFake::produce(edm::StreamID, edm::Event&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_NONLTO_X_2025-06-04-1100/lib/el8_amd64_gcc12/pluginEventFilterHGCalRawToDigiAuto.so
#5  0x000014775d51abf0 in edm::global::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_NONLTO_X_2025-06-02-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so

Current Modules:

Module: HGCalRecHitProducer:hltHGCalRecHit (crashed)
Module: HGCalRawToDigiFake:hltHgcalDigis
Module: MixingModule:mix
Module: HGCalRawToDigiFake:hltHgcalDigis

A fatal system signal has occurred: abort signal

dan131riley avatar Jun 04 '25 14:06 dan131riley

In gcc13/ASAN IBs, we get errors like following (see log for details).

I am not sure if the error above is due to this stack overflow issue

==2519985==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x14cda6d6f09c at pc 0x14ccd3ad132b bp 0x7ffe5293c440 sp 0x7ffe5293c438
READ of size 4 at 0x14cda6d6f09c thread T0
    #0 0x14ccd3ad132a in HGCDigitizer::accumulate(edm::Handle<std::vector<PCaloHit, std::allocator<PCaloHit> > > const&, int, HGCalGeometry const*, CLHEP::HepRandomEngine*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/pluginSimCalorimetryHGCalSimProducersPlugins.so+0x8132a) (BuildId: 14a1ca2440a44f94d809ddd8788351e2b2308edc)
    #1 0x14ccd3ad1892 in HGCDigitizer::accumulate(edm::Event const&, edm::EventSetup const&, CLHEP::HepRandomEngine*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/pluginSimCalorimetryHGCalSimProducersPlugins.so+0x81892) (BuildId: 14a1ca2440a44f94d809ddd8788351e2b2308edc)
    #2 0x14ccd632cc26 in edm::MixingModule::accumulateEvent(edm::Event const&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/pluginSimGeneralMixingModulePlugins.so+0x12cc26) (BuildId: 5586978bf8922d0bc78e322ad8aa48b32c002a61)
    #3 0x14ccd632cdbd in edm::MixingModule::addSignals(edm::Event const&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/pluginSimGeneralMixingModulePlugins.so+0x12cdbd) (BuildId: 5586978bf8922d0bc78e322ad8aa48b32c002a61)
    #4 0x14ccd60bdc12 in edm::BMixingModule::produce(edm::Event&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/libMixingBase.so+0x58c12) (BuildId: 4476f70088798ef1397b69f06b446b8920e45aa9)
    #5 0x14cdac212c8f in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/libFWCoreFramework.so+0xa12c8f) (BuildId: 6afac244853d1a347fd1e462270b8b722b8c27d4)
    #6 0x14cdac1804f8 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/libFWCoreFramework.so+0x9804f8) (BuildId: 6afac244853d1a347fd1e462270b8b722b8c27d4)
    #7 0x14cdabdeddc3 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/libFWCoreFramework.so+0x5eddc3) (BuildId: 6afac244853d1a347fd1e462270b8b722b8c27d4)
    #8 0x14cdabdee4e3 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/libFWCoreFramework.so+0x5ee4e3) (BuildId: 6afac244853d1a347fd1e462270b8b722b8c27d4)
    #9 0x14cdabdfe7de in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/libFWCoreFramework.so+0x5fe7de) (BuildId: 6afac244853d1a347fd1e462270b8b722b8c27d4)
    #10 0x14cdacb8e672 in tbb::detail::d2::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/libFWCoreConcurrency.so+0x16672) (BuildId: 4df6cd614764464d704906b5076d58ffbb970fd4)
    #11 0x14cdab1c6602 in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/external/tbb/v2022.0.0-99daee93c0754d523537c2852786d9af/tbb-v2022.0.0/src/tbb/task_dispatcher.h:334
    #12 0x14cdab1c6602 in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/external/tbb/v2022.0.0-99daee93c0754d523537c2852786d9af/tbb-v2022.0.0/src/tbb/task_dispatcher.h:470
    #13 0x14cdab1c6602 in tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/external/tbb/v2022.0.0-99daee93c0754d523537c2852786d9af/tbb-v2022.0.0/src/tbb/task_dispatcher.cpp:168
    #14 0x14cdabb66d60 in edm::FinalWaitingTask::wait() (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/libFWCoreFramework.so+0x366d60) (BuildId: 6afac244853d1a347fd1e462270b8b722b8c27d4)
    #15 0x14cdabb17f1a in edm::EventProcessor::processRuns() (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/libFWCoreFramework.so+0x317f1a) (BuildId: 6afac244853d1a347fd1e462270b8b722b8c27d4)
    #16 0x14cdabb4074d in edm::EventProcessor::runToCompletion() (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/libFWCoreFramework.so+0x34074d) (BuildId: 6afac244853d1a347fd1e462270b8b722b8c27d4)
    #17 0x40c615 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/bin/el8_amd64_gcc13/cmsRun+0x40c615) (BuildId: fd438984ed0532cbe47c36fbeae2b3f46d41c196)
    #18 0x14cdab1b58e0 in tbb::detail::r1::task_arena_impl::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc13/external/tbb/v2022.0.0-99daee93c0754d523537c2852786d9af/tbb-v2022.0.0/src/tbb/arena.cpp:821
    #19 0x415591 in main::{lambda()#1}::operator()() const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/bin/el8_amd64_gcc13/cmsRun+0x415591) (BuildId: fd438984ed0532cbe47c36fbeae2b3f46d41c196)
    #20 0x408b1c in main (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/bin/el8_amd64_gcc13/cmsRun+0x408b1c) (BuildId: fd438984ed0532cbe47c36fbeae2b3f46d41c196)
    #21 0x14cdaa2647e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: 889235a2805b8308b2d0274921bbe1890e9a1986)
    #22 0x408e7d in _start (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02892/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/bin/el8_amd64_gcc13/cmsRun+0x408e7d) (BuildId: fd438984ed0532cbe47c36fbeae2b3f46d41c196)

Address 0x14cda6d6f09c is located in stack of thread T0 at offset 4252 in frame
    #0 0x14ccd3ac9adf in HGCDigitizer::accumulate(edm::Handle<std::vector<PCaloHit, std::allocator<PCaloHit> > > const&, int, HGCalGeometry const*, CLHEP::HepRandomEngine*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc13/cms/cmssw/CMSSW_15_1_ASAN_X_2025-06-01-2300/lib/el8_amd64_gcc13/pluginSimCalorimetryHGCalSimProducersPlugins.so+0x79adf) (BuildId: 14a1ca2440a44f94d809ddd8788351e2b2308edc)

  This frame has 152 object(s):
    [32, 33) '<unknown>'
    [48, 49) '<unknown>'
    [64, 65) '<unknown>'
    [80, 81) '<unknown>'
    [96, 97) '<unknown>'
    [112, 113) '<unknown>'
    [128, 129) '<unknown>'
    [144, 145) '<unknown>'
    [160, 161) '<unknown>'
    [176, 177) '<unknown>'
    [192, 193) '<unknown>'
    [208, 209) '<unknown>'
    [224, 225) '<unknown>'
    [240, 244) 'id' (line 560)
    [256, 260) '<unknown>'
    [272, 276) '<unknown>'
    [288, 292) 'id' (line 577)
    [304, 308) '<unknown>'
    [320, 324) '<unknown>'
    [336, 344) 'simHitIt' (line 583)
    [368, 376) 'findPos' (line 621)
    [400, 408) '<unknown>'
    [432, 440) 'insertedPos' (line 627)
    [464, 472) '<unknown>'
    [496, 504) '<unknown>'
    [528, 536) '<unknown>'
    [560, 568) '<unknown>'
    [592, 600) 'step' (line 641)
    [624, 632) '<unknown>'
    [656, 664) '<unknown>'
    [688, 696) '<unknown>'
    [720, 728) '<unknown>'
    [752, 760) 'stepEnd' (line 651)
    [784, 792) '<unknown>'
    [816, 824) '<unknown>'
    [848, 856) '<unknown>'
    [880, 888) '<unknown>'
    [912, 920) '<unknown>'
    [944, 952) '<unknown>'
    [976, 984) '<unknown>'
    [1008, 1016) '<unknown>'
    [1040, 1048) '__it'
    [1072, 1080) '<unknown>'
    [1104, 1112) '<unknown>'
    [1136, 1144) '<unknown>'
    [1168, 1176) '<unknown>'
    [1200, 1208) '<unknown>'
    [1232, 1240) '<unknown>'
    [1264, 1272) '<unknown>'
    [1296, 1304) '<unknown>'
    [1328, 1336) '<unknown>'
    [1360, 1368) '<unknown>'
    [1392, 1400) '<unknown>'
    [1424, 1432) '<unknown>'
    [1456, 1464) '<unknown>'
    [1488, 1496) '<unknown>'
    [1520, 1528) '<unknown>'
    [1552, 1560) '<unknown>'
    [1584, 1592) '<unknown>'
    [1616, 1624) '<unknown>'
    [1648, 1656) '<unknown>'
    [1680, 1688) '<unknown>'
    [1712, 1720) '<unknown>'
    [1744, 1752) '<unknown>'
    [1776, 1784) '<unknown>'
    [1808, 1816) '<unknown>'
    [1840, 1848) '<unknown>'
    [1872, 1880) '<unknown>'
    [1904, 1912) '<unknown>'
    [1936, 1944) '<unknown>'
    [1968, 1976) '<unknown>'
    [2000, 2008) '<unknown>'
    [2032, 2040) '<unknown>'
    [2064, 2072) '<unknown>'
    [2096, 2104) '<unknown>'
    [2128, 2136) '<unknown>'
    [2160, 2168) '<unknown>'
    [2192, 2200) '<unknown>'
    [2224, 2232) '<unknown>'
    [2256, 2264) '<unknown>'
    [2288, 2296) '<unknown>'
    [2320, 2328) '<unknown>'
    [2352, 2360) '<unknown>'
    [2384, 2392) '<unknown>'
    [2416, 2424) '<unknown>'
    [2448, 2456) '<unknown>'
    [2480, 2488) '<unknown>'
    [2512, 2520) '<unknown>'
    [2544, 2552) '<unknown>'
    [2576, 2584) '<unknown>'
    [2608, 2616) '<unknown>'
    [2640, 2648) '<unknown>'
    [2672, 2680) '<unknown>'
    [2704, 2712) '<unknown>'
    [2736, 2744) '<unknown>'
    [2768, 2776) '<unknown>'
    [2800, 2808) '<unknown>'
    [2832, 2840) '__i'
    [2864, 2872) '<unknown>'
    [2896, 2904) '__next'
    [2928, 2936) '__it'
    [2960, 2968) '<unknown>'
    [2992, 3000) '<unknown>'
    [3024, 3032) '<unknown>'
    [3056, 3064) '<unknown>'
    [3088, 3096) '<unknown>'
    [3120, 3128) '__pos'
    [3152, 3160) '__it'
    [3184, 3192) '<unknown>'
    [3216, 3224) '<unknown>'
    [3248, 3256) '<unknown>'
    [3280, 3288) '<unknown>'
    [3312, 3320) '<unknown>'
    [3344, 3352) '<unknown>'
    [3376, 3384) '<unknown>'
    [3408, 3416) '<unknown>'
    [3440, 3448) '<unknown>'
    [3472, 3480) '<unknown>'
    [3504, 3512) '<unknown>'
    [3536, 3544) '__middle'
    [3568, 3576) '<unknown>'
    [3600, 3608) '<unknown>'
    [3632, 3640) '<unknown>'
    [3664, 3672) '<unknown>'
    [3696, 3704) '<unknown>'
    [3728, 3736) '<unknown>'
    [3760, 3768) '<unknown>'
    [3792, 3800) '<unknown>'
    [3824, 3832) '<unknown>'
    [3856, 3864) '<unknown>'
    [3888, 3896) '<unknown>'
    [3920, 3928) '<unknown>'
    [3952, 3960) '<unknown>'
    [3984, 3992) '__middle'
    [4016, 4024) '<unknown>'
    [4048, 4056) '<unknown>'
    [4080, 4088) '<unknown>'
    [4112, 4120) '<unknown>'
    [4144, 4152) '<unknown>'
    [4176, 4184) '<unknown>'
    [4208, 4216) '<unknown>'
    [4240, 4252) 'tdcForToAOnset' (line 550) <== Memory access at offset 4252 overflows this variable
    [4272, 4284) '__val'
    [4304, 4320) '<unknown>'
    [4336, 4352) '<unknown>'
    [4368, 4384) '<unknown>'
    [4400, 4416) '__node'
    [4432, 4448) '<unknown>'
    [4464, 4480) '<unknown>'
    [4496, 4512) '<unknown>'
    [4528, 4552) 'hitRefs' (line 556)
    [4592, 4728) '<unknown>'

smuzaffar avatar Jun 04 '25 14:06 smuzaffar

@smuzaffar

I am not sure if the error above is due to this stack overflow issue

The problem seen here is due to geometry inconsistency. Some code expects only 3 thicknesses while a recent change added a 4th thickness. The stack overflow could be causing other problems however.

Dr15Jones avatar Jun 04 '25 14:06 Dr15Jones

@cms-sw/geometry-l2 This crash is still occurring

makortel avatar Jul 22 '25 16:07 makortel

@cms-sw/geometry-l2 any updates? We still observe this RelVal failure.

iarspider avatar Aug 14 '25 09:08 iarspider

@cms-sw/geometry-l2 The assertion is no longer hit (at least in CMSSW_15_1_X_2025-08-18-2300), but we observe a new failure, but only in el9 MULTIARCH4_X IB:

Thread 6 (Thread 0x14a8734b6640 (LWP 3962932) "cmsRun"):
#0  0x000014a8c86f9fdf in poll () from /lib64/libc.so.6
#1  0x000014a8c47b4297 in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02903/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_MULTIARCHSV4_X_2025-08-18-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x000014a8c47b4494 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02903/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_MULTIARCHSV4_X_2025-08-18-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000014a7c68ea691 in std::vector<unsigned int, std::allocator<unsigned int> >::push_back(unsigned int const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02903/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_MULTIARCHSV4_X_2025-08-18-2300/lib/el9_amd64_gcc12/pluginRecoHGCalTICLPlugins.so
#5  0x000014a7c6963c7b in TICLLayerTileProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02903/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_MULTIARCHSV4_X_2025-08-18-2300/lib/el9_amd64_gcc12/pluginRecoHGCalTICLPlugins.so
#6  0x000014a8c9855d65 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02903/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_MULTIARCHSV4_X_2025-08-18-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#7  0x000014a8c983a58c in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02903/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_MULTIARCHSV4_X_2025-08-18-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#8  0x000014a8c97c0d29 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02903/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_MULTIARCHSV4_X_2025-08-18-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#9  0x000014a8c97c1224 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02903/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_MULTIARCHSV4_X_2025-08-18-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#10 0x000014a8c9941388 in tbb::detail::d2::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02903/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_MULTIARCHSV4_X_2025-08-18-2300/lib/el9_amd64_gcc12/libFWCoreConcurrency.so
#11 0x000014a8c94fa5da in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=<optimized out>, waiter=..., this=0x14a8c7781f00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-2c8b19d7f71a88d9ed1f550c4776837f/tbb-v2022.0.0/src/tbb/task_dispatcher.h:334
#12 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x14a8c7781f00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-2c8b19d7f71a88d9ed1f550c4776837f/tbb-v2022.0.0/src/tbb/task_dispatcher.h:470
#13 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-2c8b19d7f71a88d9ed1f550c4776837f/tbb-v2022.0.0/src/tbb/arena.cpp:215
#14 tbb::detail::r1::thread_dispatcher_client::process (td=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-2c8b19d7f71a88d9ed1f550c4776837f/tbb-v2022.0.0/src/tbb/thread_dispatcher_client.h:41
#15 tbb::detail::r1::thread_dispatcher::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-2c8b19d7f71a88d9ed1f550c4776837f/tbb-v2022.0.0/src/tbb/thread_dispatcher.cpp:195
#16 0x000014a8c94f2688 in tbb::detail::r1::rml::private_worker::run (this=0x14a8c4f69100) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-2c8b19d7f71a88d9ed1f550c4776837f/tbb-v2022.0.0/src/tbb/private_server.cpp:271
#17 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x14a8c4f69100) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-2c8b19d7f71a88d9ed1f550c4776837f/tbb-v2022.0.0/src/tbb/private_server.cpp:221
#18 0x000014a8c868219a in start_thread () from /lib64/libc.so.6
#19 0x000014a8c8707210 in clone3 () from /lib64/libc.so.6

iarspider avatar Aug 19 '25 10:08 iarspider

(does not reproduce in local build though)

iarspider avatar Aug 19 '25 10:08 iarspider