cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Isolated HLT crash in 2025 PbPb run related to `FastjetJetProducer`

Open mmusich opened this issue 2 weeks ago • 11 comments

On Sunday 7th of December, 2025 (PbPb collisions, Era HI2025A, online release CMSSW_15_1_0_patch3) during run-400391 we got a single isolated HLT crash e-log involving the following exception:

----- Begin Fatal Exception 09-Dec-2025 10:30:05 CET-----------------------
An exception of category 'Unknown' occurred while
   [0] Processing  Event run: 400391 lumi: 344 event: 848398240 stream: 0
   [1] Running path 'HLT_HICentrality50100MinimumBiasHF1AND_Beamspot_v3'
   [2] Calling method for module FastjetJetProducer/'hltKT4PFJetsForRho'
Exception Message:
An exception of unknown type was thrown.
----- End Fatal Exception -------------------------------------------------

The full log from F3 Mon is attached to the thread:

f3mon_logtable_2025-12-09T09_46_35.647Z.txt

The issue can be successfully reproduced with:

#!/bin/bash -ex

RUN_NUMBER=400391

# Define the directory with the given run number
DIR="/store/group/tsg/FOG/error_stream_root/run${RUN_NUMBER}/"

# Generate a comma-separated list of the full file paths
file_list=$(ls "/eos/cms$DIR" | awk -v dir="$DIR" '{print dir $0}' | paste -sd "," -)

# Print the result
echo "$file_list"

hltGetConfiguration run:$RUN_NUMBER \
		    --globaltag 150X_dataRun3_HLT_v1 \
		    --data \
		    --no-prescale \
		    --no-output \
		    --max-events -1 \
		    --input $file_list  > hlt_${RUN_NUMBER}.py

cat <<@EOF >> hlt_${RUN_NUMBER}.py
process.options.wantSummary = True
process.options.numberOfThreads = 1
process.options.numberOfStreams = 0
@EOF

# Run cmsRun with the generated configuration
cmsRun hlt_${RUN_NUMBER}.py &> hlt_${RUN_NUMBER}.log

mmusich avatar Dec 09 '25 09:12 mmusich

cms-bot internal usage

cmsbuild avatar Dec 09 '25 09:12 cmsbuild

A new Issue was created by @mmusich.

@Dr15Jones, @ftenchini, @makortel, @mandrenguyen, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Dec 09 '25 09:12 cmsbuild

assign RecoJets/JetProducers

makortel avatar Dec 09 '25 15:12 makortel

New categories assigned: reconstruction

@jfernan2,@mandrenguyen,@srimanob you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Dec 09 '25 15:12 cmsbuild

This message

An exception of unknown type was thrown.

means that an object of some other type than anything deriving from std::exception, or std::string or chat const*, was thrown.

makortel avatar Dec 09 '25 15:12 makortel

I'd suggest to check the origin either via gdb (catch throw) or cmsTraceExceptions script (that wraps gdb).

makortel avatar Dec 09 '25 15:12 makortel

type jetmet

jfernan2 avatar Dec 09 '25 16:12 jfernan2

@lnurfikri89 as JetMet RECO contact, could you please have a look? Thanks

jfernan2 avatar Dec 09 '25 16:12 jfernan2

@nurfikri89 (assuming this was the intention of https://github.com/cms-sw/cmssw/issues/49576#issuecomment-3633166640)

mmusich avatar Dec 09 '25 17:12 mmusich

it it helps,

# follow recipe at https://github.com/cms-sw/cmssw/issues/49576#issue-3710101962 to generate hlt_400391.py 
cmsTraceExceptions cmsRun hlt_400391.py > & log.log

produces the log file I have posted here. I see in it:


Thread 1 "cmsRun" hit Catchpoint 1 (exception thrown), 0x00007ffff70f02f1 in __cxxabiv1::__cxa_throw (obj=0x7ffeb9906bc0, 
    tinfo=0x7fff4d6ae880 <typeinfo for fastjet::Error>, 
    dest=0x7fff4d46be10 <fastjet::Error::~Error()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
warning: 81	../../../../libstdc++-v3/libsupc++/eh_throw.cc: No such file or directory
#0  0x00007ffff70f02f1 in __cxxabiv1::__cxa_throw (obj=0x7ffeb9906bc0, 
    tinfo=0x7fff4d6ae880 <typeinfo for fastjet::Error>, 
    dest=0x7fff4d46be10 <fastjet::Error::~Error()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1  0x00007fff4d38ea9c in fastjet::ClusterSequenceActiveArea::_throw_unless_jets_have_same_perp_or_E (this=this@entry=0x7ffeb81caec0, jet=..., refjet=..., 
    tolerance=tolerance@entry=9.9999999999999994e-12, jets_ghosted_seq=...)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/new_allocator.h:90
#2  0x00007fff4d3c43de in fastjet::ClusterSequenceActiveArea::_transfer_areas (
    this=this@entry=0x7ffeb81caec0, unique_hist_order=..., ghosted_seq=...)
    at ClusterSequenceActiveArea.cc:634
#3  0x00007fff4d3c62e0 in fastjet::ClusterSequenceActiveArea::_run_AA (
    this=this@entry=0x7ffeb81caec0, ghost_spec=...)
    at ClusterSequenceActiveArea.cc:145
#4  0x00007fff4d3c6753 in fastjet::ClusterSequenceActiveArea::_initialise_and_run_AA (this=0x7ffeb81caec0, jet_def_in=..., ghost_spec=..., 
    writeout_combinations=<optimized out>) at ClusterSequenceActiveArea.cc:61
#5  0x00007fff4d5e3776 in void fastjet::ClusterSequenceArea::initialize_and_run_cswa<fastjet::PseudoJet>(std::vector<fastjet::PseudoJet, std::allocator<fastjet::PseudoJet> > const&, fastjet::JetDefinition const&) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_0_patch3/lib/el8_amd64_gcc12/pluginRecoJetsJetProducers_plugins.so
#6  0x00007fff4d5e4236 in fastjet::ClusterSequenceArea::ClusterSequenceArea<fastjet::PseudoJet>(std::vector<fastjet::PseudoJet, std::allocator<fastjet::PseudoJet> > const&, fastjet::JetDefinition const&, fastjet::AreaDefinition const&) [clone .lto_priv.0] ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_0_patch3/lib/el8_amd64_gcc12/pluginRecoJetsJetProducers_plugins.so
#7  0x00007fff4d5fed60 in FastjetJetProducer::runAlgorithm(edm::Event&, edm::EventSetup const&) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_0_patch3/lib/el8_amd64_gcc12/pluginRecoJetsJetProducers_plugins.so
#8  0x00007fff4d64014d in VirtualJetProducer::produce(edm::Event&, edm::EventSetup const&) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_0_patch3/lib/el8_amd64_gcc12/pluginRecoJetsJetProducers_plugins.so
#9  0x00007fff4d5f998d in FastjetJetProducer::produce(edm::Event&, edm::EventSetup const&) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_0_patch3/lib/el8_amd64_gcc12/pluginRecoJetsJetProducers_plugins.so
#10 0x00007ffff7cf3775 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el8_amd64_gcc12/libFWCoreFramework.so
#11 0x00007ffff7cd7e3c in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el8_amd64_gcc12/libFWCoreFramework.so
#12 0x00007ffff7c5e389 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el8_amd64_gcc12/libFWCoreFramework.so
#13 0x00007ffff7c5e891 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el8_amd64_gcc12/libFWCoreFramework.so
#14 0x00007ffff7ecb388 in tbb::detail::d2::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#15 0x00007ffff728d87b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x7fff349bc200, 
    this=<optimized out>)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-0fa9c7b61a60a427cb7121233c828101/tbb-v2022.0.0/src/tbb/task_dispatcher.h:334
#16 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=<optimized out>)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-0fa9c7b61a60a427cb7121233c828101/tbb-v2022.0.0/src/tbb/task_dispatcher.h:470
#17 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, 
    wait_ctx=..., w_ctx=...)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-0fa9c7b61a60a427cb7121233c828101/tbb-v2022.0.0/src/tbb/task_dispatcher.cpp:168
#18 0x00007ffff7be087f in edm::FinalWaitingTask::wait() ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el8_amd64_gcc12/libFWCoreFramework.so
#19 0x00007ffff7bf12f1 in edm::EventProcessor::processRuns() ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el8_amd64_gcc12/libFWCoreFramework.so
#20 0x00007ffff7be9981 in edm::EventProcessor::runToCompletion() ()
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el8_amd64_gcc12/libFWCoreFramework.so
#21 0x0000000000408556 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#22 0x00007ffff727bf71 in tbb::detail::r1::task_arena_impl::execute (ta=..., 
    d=...)
    at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-0fa9c7b61a60a427cb7121233c828101/tbb-v2022.0.0/src/tbb/arena.cpp:821
#23 0x000000000040a283 in main::{lambda()#1}::operator()() const ()
#24 0x00000000004051b8 in main ()

mmusich avatar Dec 09 '25 17:12 mmusich

@jfernan2 @mmusich thanks for the ping and tip. I will have a look at it.

nurfikri89 avatar Dec 09 '25 18:12 nurfikri89