cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Offline crashes in `{HLT,L1}TriggerJSONMonitoring` in `CMSSW_14_0_6_MULTIARCHS`

Open mmusich opened this issue 9 months ago • 22 comments

@silviodonato reported a crash in CMSSW_14_0_6_MULTIARCHS when running:

ssh lxplus8.cern.ch
export SCRAM_ARCH=el8_amd64_gcc12
cmsrel CMSSW_14_0_6_MULTIARCHS
cd CMSSW_14_0_6_MULTIARCHS/src
cmsenv
hltGetConfiguration run:380647 --globaltag  140X_dataRun3_HLT_v3  --input file:/eos/cms/tier0/store/data/Run2024D/EphemeralHLTPhysics0/RAW/v1/000/380/647/00000/a8bb2f4f-008c-454b-8a8c-f77ff51e8fcf.root

concerning:

Thread 1 (Thread 0x7fe7ae29d640 (LWP 1513991) "cmsRun"):
#0  0x00007fe7aee6a301 in poll () from /lib64/libc.so.6
#1  0x00007fe7a26f62ff in full_read.constprop () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#2  0x00007fe7a26a9afc in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#3  0x00007fe7a26aa460 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007fe7aee14e41 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
#6  0x00007fe7af8117ab in std::char_traits<char>::copy (__n=49, __s2=<optimized out>, __s1=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/bits/char_traits.h:435
#7  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy (__n=49, __s=<optimized out>, __d=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/bits/basic_string.h:431
#8  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy (__n=49, __s=<optimized out>, __d=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/bits/basic_string.h:426
#9  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign (this=0x7fffc5118a40, __str=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:291
#10 0x00007fe711d1caf6 in L1TriggerJSONMonitoring::globalEndLuminosityBlockSummary(edm::LuminosityBlock const&, edm::EventSetup const&, L1TriggerJSONMonitoringData::lumisection*) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginHLTriggerJSONMonitoringPlugins.so
#11 0x00007fe711d1d8c8 in virtual thunk to edm::global::impl::LuminosityBlockSummaryCacheHolder<edm::global::EDAnalyzerBase, L1TriggerJSONMonitoringData::lumisection>::doEndLuminosityBlockSummary_(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginHLTriggerJSONMonitoringPlugins.so
#12 0x00007fe7b18c1ff5 in edm::global::EDAnalyzerBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#13 0x00007fe7b18b9da0 in edm::WorkerT<edm::global::EDAnalyzerBase>::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#14 0x00007fe7b1807a7f in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#15 0x00007fe7b17f5ef8 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#16 0x00007fe7b17b8bae in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#17 0x00007fe7afff3281 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7fe7acc99380) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#18 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7fe7acc99380) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#19 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#20 0x00007fe7b17c941b in edm::FinalWaitingTask::wait() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#21 0x00007fe7b17d324d in edm::EventProcessor::processRuns() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#22 0x00007fe7b17d37b1 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#23 0x00000000004074ef in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#24 0x00007fe7affdf9ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/arena.cpp:688
#25 0x0000000000408ed2 in main::{lambda()#1}::operator()() const ()
#26 0x000000000040517c in main ()

Current Modules:

Module: L1TriggerJSONMonitoring:hltL1TriggerJSONMonitoring (crashed)Segmentation fault (core dumped)

Trying to reproduce with a slightly different setup (e.g. the script below)

#!/bin/bash -ex

# CMSSW_14_0_6_MULTIARCHS

hltGetConfiguration run:380647 \
            --globaltag 140X_dataRun3_HLT_v3 \
            --input file:/eos/cms/tier0/store/data/Run2024D/EphemeralHLTPhysics0/RAW/v1/000/380/647/00000/a8bb2f4f-008c-454b-8a8c-f77ff51e8fcf.root > hlt_run380647.py

cat <<@EOF >> hlt_run380647.py
process.options.wantSummary = False
process.options.numberOfThreads = 1
process.options.numberOfStreams = 0
@EOF

cmsRun hlt_run380647.py &> hlt.log

I get a different crash (also on CPU-only) involving

Thread 1 (Thread 0x7fed272ac640 (LWP 2328682) "cmsRun"):
#0  0x00007fed27e79301 in poll () from /lib64/libc.so.6
#1  0x00007fed1b72f2ff in full_read.constprop () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#2  0x00007fed1b6e2afc in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#3  0x00007fed1b6e3460 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007fed27e23e37 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
#6  0x00007fed27de7009 in __GI__IO_file_xsputn () from /lib64/libc.so.6
#7  0x00007fed27ddc19c in fwrite () from /lib64/libc.so.6
#8  0x00007fed2881127d in std::basic_streambuf<char, std::char_traits<char> >::sputn (__n=50, __s=0x0, this=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/streambuf:455
#9  std::__ostream_write<char, std::char_traits<char> > (__n=50, __s=0x0, __out=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/bits/ostream_insert.h:51
#10 std::__ostream_insert<char, std::char_traits<char> > (__out=..., __s=0x0, __n=50) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/bits/ostream_insert.h:102
#11 0x00007fec9955a16b in HLTriggerJSONMonitoring::globalEndLuminosityBlockSummary(edm::LuminosityBlock const&, edm::EventSetup const&, HLTriggerJSONMonitoringData::lumisection*) const () from /tmp/musich/hltL1TriggerJSONMonitoring/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginHLTriggerJSONMonitoringPlugins.so
#12 0x00007fec9955f0c8 in virtual thunk to edm::global::impl::LuminosityBlockSummaryCacheHolder<edm::global::EDAnalyzerBase, HLTriggerJSONMonitoringData::lumisection>::doEndLuminosityBlockSummary_(edm::LuminosityBlock const&, edm::EventSetup const&) () from /tmp/musich/hltL1TriggerJSONMonitoring/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginHLTriggerJSONMonitoringPlugins.so
#13 0x00007fed2a8d0ff5 in edm::global::EDAnalyzerBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#14 0x00007fed2a8c8da0 in edm::WorkerT<edm::global::EDAnalyzerBase>::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#15 0x00007fed2a816a7f in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#16 0x00007fed2a804ef8 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#17 0x00007fed2a7c7bae in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#18 0x00007fed29002281 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7fed25c83e00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#19 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7fed25c83e00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#20 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#21 0x00007fed2a7d841b in edm::FinalWaitingTask::wait() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#22 0x00007fed2a7e224d in edm::EventProcessor::processRuns() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#23 0x00007fed2a7e27b1 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#24 0x00000000004074ef in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#25 0x00007fed28fee9ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/arena.cpp:688
#26 0x0000000000408ed2 in main::{lambda()#1}::operator()() const ()
#27 0x000000000040517c in main ()

Current Modules:

Module: HLTriggerJSONMonitoring:hltHLTriggerJSONMonitoring (crashed)
Module: none

A fatal system signal has occurred: segmentation violation

As additional information, it looks like it depends on the output configuration. Setting:

  • --output full [*] caveat at https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideGlobalHLT#General_Usage
  • --output minimal
  • --output none

it runs without problems, whereas setting:

  • --output all

it crashes are reported above.

FYI @missirol @fwyzard @cms-sw/hlt-l2

mmusich avatar May 15 '24 12:05 mmusich