cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Crash in prompt reco because of read failure

Open makortel opened this issue 2 years ago • 6 comments

A crash

R__unzipLZMA: error 9 in lzma_code
----- Begin Fatal Exception 07-Aug-2022 15:19:46 CEST-----------------------
An exception of category ‘FileReadError’ occurred while
[0] Processing Event run: 356719 lumi: 241 event: 297977786 stream: 0
[1] Running path ‘dqmoffline_step’
[2] Prefetching for module DQMMessageLogger/‘DQMMessageLogger’
[3] Prefetching for module LogErrorHarvester/‘logErrorHarvester’
[4] Prefetching for module CSCSegmentProducer/‘cscSegments’
[5] Prefetching for module CSCRecHitDProducer/‘csc2DRecHits’
[6] Prefetching for module CSCDCCUnpacker/‘muonCSCDigis’
[7] While reading from source FEDRawDataCollection rawDataCollector ‘’ LHC
[8] Reading branch FEDRawDataCollection_rawDataCollector__LHC.
Additional Info:
[a] Fatal Root Error: @SUB=TBasket::ReadBasketBuffers
fNbytes = 3131524, fKeylen = 115, fObjlen = 5516120, noutot = 0, nout=0, nin=3131409, nbuf=5516120

----- End Fatal Exception -------------------------------------------------

was observed in prompt reco https://cms-talk.web.cern.ch/t/promptreco-crash-in-run-356719/13691/1. The crash is fully reproducible.

makortel avatar Aug 08 '22 13:08 makortel

assign core

makortel avatar Aug 08 '22 13:08 makortel

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Aug 08 '22 13:08 cmsbuild

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Aug 08 '22 13:08 cmsbuild

@pcanal Could you take a look? Here is a recipe to reproduce on cmsdev32

cmsrel CMSSW_12_4_5
cd CMSSW_12_4_5/src
cp /build/mkortela/debug/CMSSW_12_4_5/src/PSet.p* .
cmsRun PSet.py

It will start processing directly the problematic event.

makortel avatar Aug 08 '22 13:08 makortel

@germanfgv re-run the REPACK for this job in a replay, and I was able to run the PromtpReco job successfully with that input. This test suggests that the problem in the REPACK might not be reproducible.

makortel avatar Aug 08 '22 16:08 makortel

R__unzipLZMA: error 9 in lzma_code

As far I can tell this means that there is a read error within the lzma decoding, i.e. the buffer is most likely corrupted. It could be corruption that happened during writing (bug in the writing code) or after writing (hardware or network failure). To track it down we would need to reproduce the writing failure.

pcanal avatar Aug 08 '22 21:08 pcanal

+core

Nothing to add.

makortel avatar Apr 13 '23 14:04 makortel

This issue is fully signed and ready to be closed.

cmsbuild avatar Apr 13 '23 14:04 cmsbuild