cmssw
cmssw copied to clipboard
Crash in prompt reco because of read failure
A crash
R__unzipLZMA: error 9 in lzma_code
----- Begin Fatal Exception 07-Aug-2022 15:19:46 CEST-----------------------
An exception of category ‘FileReadError’ occurred while
[0] Processing Event run: 356719 lumi: 241 event: 297977786 stream: 0
[1] Running path ‘dqmoffline_step’
[2] Prefetching for module DQMMessageLogger/‘DQMMessageLogger’
[3] Prefetching for module LogErrorHarvester/‘logErrorHarvester’
[4] Prefetching for module CSCSegmentProducer/‘cscSegments’
[5] Prefetching for module CSCRecHitDProducer/‘csc2DRecHits’
[6] Prefetching for module CSCDCCUnpacker/‘muonCSCDigis’
[7] While reading from source FEDRawDataCollection rawDataCollector ‘’ LHC
[8] Reading branch FEDRawDataCollection_rawDataCollector__LHC.
Additional Info:
[a] Fatal Root Error: @SUB=TBasket::ReadBasketBuffers
fNbytes = 3131524, fKeylen = 115, fObjlen = 5516120, noutot = 0, nout=0, nin=3131409, nbuf=5516120
----- End Fatal Exception -------------------------------------------------
was observed in prompt reco https://cms-talk.web.cern.ch/t/promptreco-crash-in-run-356719/13691/1. The crash is fully reproducible.
assign core
New categories assigned: core
@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks
A new Issue was created by @makortel Matti Kortelainen.
@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
@pcanal Could you take a look? Here is a recipe to reproduce on cmsdev32
cmsrel CMSSW_12_4_5
cd CMSSW_12_4_5/src
cp /build/mkortela/debug/CMSSW_12_4_5/src/PSet.p* .
cmsRun PSet.py
It will start processing directly the problematic event.
@germanfgv re-run the REPACK for this job in a replay, and I was able to run the PromtpReco job successfully with that input. This test suggests that the problem in the REPACK might not be reproducible.
R__unzipLZMA: error 9 in lzma_code
As far I can tell this means that there is a read error within the lzma decoding, i.e. the buffer is most likely corrupted. It could be corruption that happened during writing (bug in the writing code) or after writing (hardware or network failure). To track it down we would need to reproduce the writing failure.
+core
Nothing to add.
This issue is fully signed and ready to be closed.