cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Exception from HGCalBackendLayer2Producer

Open makortel opened this issue 2 years ago • 21 comments

The workflow 25234.911 step2 occasionally fails with an exception

----- Begin Fatal Exception 27-Apr-2023 17:14:00 CEST-----------------------
An exception of category 'OutOfBound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 3 stream: 1
   [1] Running path 'FEVTDEBUGHLToutput_step'
   [2] Prefetching for module PoolOutputModule/'FEVTDEBUGHLToutput'
   [3] Prefetching for module L1TEGMultiMerger/'l1tLayer1EG'
   [4] Prefetching for module L1TCorrelatorLayer1Producer/'l1tLayer1HGCal'
   [5] Prefetching for module PFClusterProducerFromHGC3DClusters/'l1tPFClustersFromHGC3DClusters'
   [6] Calling method for module HGCalBackendLayer2Producer/'l1tHGCalBackEndLayer2Producer'
Exception Message:
TC X1 = 2.77857e+07 out of the seeding histogram bounds 0.076 - 0.58
----- End Fatal Exception -------------------------------------------------

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc11/CMSSW_13_1_NONLTO_X_2023-04-27-1100/pyRelValMatrixLogs/run/25234.911_TTbar_14TeV+2026D99_DD4hep/step2_TTbar_14TeV+2026D99_DD4hep.log#/

makortel avatar Apr 27 '23 19:04 makortel

assign upgrade, l1, geometry

makortel avatar Apr 27 '23 20:04 makortel

New categories assigned: geometry,upgrade,l1

@mdhildreth,@epalencia,@AdrianoDee,@Dr15Jones,@srimanob,@aloeliger,@makortel,@bsunanda,@cecilecaillol,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Apr 27 '23 20:04 cmsbuild

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Apr 27 '23 20:04 cmsbuild

Earlier occurrences have been reported in https://github.com/cms-sw/cmssw/issues/41376#issuecomment-1514991604 and https://github.com/cms-sw/cmssw/pull/40404#issuecomment-1364061735

makortel avatar Apr 27 '23 20:04 makortel

Occurred again in CMSSW_13_2_X_2023-05-22-2300 on el9_amd64_gcc11 https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el9_amd64_gcc11/CMSSW_13_2_X_2023-05-22-2300/pyRelValMatrixLogs/run/25234.911_TTbar_14TeV+2026D99_DD4hep/step2_TTbar_14TeV+2026D99_DD4hep.log#/40-40

makortel avatar May 23 '23 13:05 makortel

Let's tag @cms-sw/hgcal-dpg-l2 as well

makortel avatar May 23 '23 13:05 makortel

New occurrence of this exception in CLANG IBs (CMSSW_13_3_CLANG_X_2023-08-04-2300):

An exception of category 'OutOfBound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 4 stream: 1
   [1] Running path 'HLTriggerFinalPath'
   [2] Prefetching for module TriggerSummaryProducerAOD/'hltTriggerSummaryAOD'
   [3] Prefetching for module EgammaHLTRecoEcalCandidateProducers/'hltEgammaCandidatesL1Seeded'
   [4] Prefetching for module PFECALSuperClusterProducer/'particleFlowSuperClusterHGCalFromTICLL1Seeded'
   [5] Prefetching for module PFClusterProducer/'particleFlowClusterHGCalFromTICLL1Seeded'
   [6] Prefetching for module PFRecHitProducer/'particleFlowRecHitHGCL1Seeded'
   [7] Prefetching for module HGCalRecHitProducer/'HGCalRecHitL1Seeded'
   [8] Prefetching for module HGCalUncalibRecHitProducer/'HGCalUncalibRecHitL1Seeded'
   [9] Prefetching for module HLTHGCalDigisInRegionsProducer/'hgcalDigisL1Seeded'
   [10] Prefetching for module L1TEGammaFilteredCollectionProducer/'hltL1TEGammaHGCFilteredCollectionProducer'
   [11] Prefetching for module L1TEGMultiMerger/'l1tLayer1EG'
   [12] Prefetching for module L1TCorrelatorLayer1Producer/'l1tLayer1HGCal'
   [13] Prefetching for module PFClusterProducerFromHGC3DClusters/'l1tPFClustersFromHGC3DClusters'
   [14] Calling method for module HGCalBackendLayer2Producer/'l1tHGCalBackEndLayer2Producer'
Exception Message:
TC X1 = inf out of the seeding histogram bounds 0.076 - 0.58

aandvalenzuela avatar Aug 07 '23 13:08 aandvalenzuela

Another occurrence of this exception - in CMSSW_13_3_X_2023-08-21-2300

----- Begin Fatal Exception 22-Aug-2023 08:29:10 CEST-----------------------
An exception of category 'OutOfBound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 2 stream: 3
   [1] Running path 'HLTriggerFinalPath'
   [2] Prefetching for module TriggerSummaryProducerAOD/'hltTriggerSummaryAOD'
   [3] Prefetching for module EgammaHLTRecoEcalCandidateProducers/'hltEgammaCandidatesL1Seeded'
   [4] Prefetching for module PFECALSuperClusterProducer/'particleFlowSuperClusterHGCalFromTICLL1Seeded'
   [5] Prefetching for module PFClusterProducer/'particleFlowClusterHGCalFromTICLL1Seeded'
   [6] Prefetching for module PFRecHitProducer/'particleFlowRecHitHGCL1Seeded'
   [7] Prefetching for module HGCalRecHitProducer/'HGCalRecHitL1Seeded'
   [8] Prefetching for module HGCalUncalibRecHitProducer/'HGCalUncalibRecHitL1Seeded'
   [9] Prefetching for module HLTHGCalDigisInRegionsProducer/'hgcalDigisL1Seeded'
   [10] Prefetching for module L1TEGammaFilteredCollectionProducer/'hltL1TEGammaHGCFilteredCollectionProducer'
   [11] Prefetching for module L1TEGMultiMerger/'l1tLayer1EG'
   [12] Prefetching for module L1TCorrelatorLayer1Producer/'l1tLayer1HGCal'
   [13] Prefetching for module PFClusterProducerFromHGC3DClusters/'l1tPFClustersFromHGC3DClusters'
   [14] Calling method for module HGCalBackendLayer2Producer/'l1tHGCalBackEndLayer2Producer'
Exception Message:
TC X1 = inf out of the seeding histogram bounds 0.076 - 0.58
----- End Fatal Exception -------------------------------------------------

iarspider avatar Aug 22 '23 11:08 iarspider

@cms-sw/hgcal-dpg-l2 instead of throwing in lines https://github.com/cms-sw/cmssw/blob/master/L1Trigger/L1THGCal/src/backend/HGCalHistoSeedingImpl.cc#L91-L92 and https://github.com/cms-sw/cmssw/blob/master/L1Trigger/L1THGCal/src/backend/HGCalHistoSeedingImpl.cc#L95-L96 couldn't you just issue a warning and pass to the next cluster (i.e. continue)?

perrotta avatar Aug 22 '23 12:08 perrotta

Happened again in CMSSW_13_3_DBG_X_2023-10-26-2300. @cms-sw/hgcal-dpg-l2 could you please reply to @perrotta 's suggestion above?

iarspider avatar Oct 27 '23 09:10 iarspider

@bsunanda , can this issue somehow connected with D99 geometry?

civanch avatar Oct 27 '23 10:10 civanch

Happened again in CMSSW_14_0_X_2023-11-23-2300:

----- Begin Fatal Exception 24-Nov-2023 07:18:03 CET-----------------------
An exception of category 'OutOfBound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 5 stream: 3
   [1] Running path 'HLTriggerFinalPath'
   [2] Prefetching for module TriggerSummaryProducerAOD/'hltTriggerSummaryAOD'
   [3] Prefetching for module EgammaHLTRecoEcalCandidateProducers/'hltEgammaCandidatesL1Seeded'
   [4] Prefetching for module PFECALSuperClusterProducer/'particleFlowSuperClusterHGCalFromTICLL1Seeded'
   [5] Prefetching for module PFClusterProducer/'particleFlowClusterHGCalFromTICLL1Seeded'
   [6] Prefetching for module PFRecHitProducer/'particleFlowRecHitHGCL1Seeded'
   [7] Prefetching for module HGCalRecHitProducer/'HGCalRecHitL1Seeded'
   [8] Prefetching for module HGCalUncalibRecHitProducer/'HGCalUncalibRecHitL1Seeded'
   [9] Prefetching for module HLTHGCalDigisInRegionsProducer/'hgcalDigisL1Seeded'
   [10] Prefetching for module L1TEGammaFilteredCollectionProducer/'hltL1TEGammaHGCFilteredCollectionProducer'
   [11] Prefetching for module L1TEGMultiMerger/'l1tLayer1EG'
   [12] Prefetching for module L1TCorrelatorLayer1Producer/'l1tLayer1HGCal'
   [13] Prefetching for module PFClusterProducerFromHGC3DClusters/'l1tPFClustersFromHGC3DClusters'
   [14] Calling method for module HGCalBackendLayer2Producer/'l1tHGCalBackEndLayer2Producer'
Exception Message:
TC X1 = inf out of the seeding histogram bounds 0.076 - 0.58
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 24-Nov-2023 07:18:03 CET-----------------------
An exception of category 'OutOfBound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 2 stream: 2
   [1] Running path 'HLTriggerFinalPath'
   [2] Prefetching for module TriggerSummaryProducerAOD/'hltTriggerSummaryAOD'
   [3] Prefetching for module EgammaHLTRecoEcalCandidateProducers/'hltEgammaCandidatesL1Seeded'
   [4] Prefetching for module PFECALSuperClusterProducer/'particleFlowSuperClusterHGCalFromTICLL1Seeded'
   [5] Prefetching for module PFClusterProducer/'particleFlowClusterHGCalFromTICLL1Seeded'
   [6] Prefetching for module PFRecHitProducer/'particleFlowRecHitHGCL1Seeded'
   [7] Prefetching for module HGCalRecHitProducer/'HGCalRecHitL1Seeded'
   [8] Prefetching for module HGCalUncalibRecHitProducer/'HGCalUncalibRecHitL1Seeded'
   [9] Prefetching for module HLTHGCalDigisInRegionsProducer/'hgcalDigisL1Seeded'
   [10] Prefetching for module L1TEGammaFilteredCollectionProducer/'hltL1TEGammaHGCFilteredCollectionProducer'
   [11] Prefetching for module L1TEGMultiMerger/'l1tLayer1EG'
   [12] Prefetching for module L1TCorrelatorLayer1Producer/'l1tLayer1HGCal'
   [13] Prefetching for module PFClusterProducerFromHGC3DClusters/'l1tPFClustersFromHGC3DClusters'
   [14] Calling method for module HGCalBackendLayer2Producer/'l1tHGCalBackEndLayer2Producer'
Exception Message:
TC X1 = 1.79347 out of the seeding histogram bounds 0.076 - 0.58
----- End Fatal Exception -------------------------------------------------

iarspider avatar Nov 24 '23 09:11 iarspider

@bsunanda @cms-sw/hgcal-dpg-l2 gentle ping.

iarspider avatar Nov 24 '23 09:11 iarspider

Can this topic be closed?

srimanob avatar Jan 17 '24 16:01 srimanob

Has the problem been fixed?

makortel avatar Jan 17 '24 17:01 makortel

Ah, the PR I see is not a fix for this issue. So, this PR is still be an issue. Sorry for the noise.

srimanob avatar Jan 17 '24 17:01 srimanob

Could you let me know how to reproduce this (which workflow, which type of build, etc)?Regards ' Sunanda


From: iarspider @.> Sent: 24 November 2023 15:24 To: cms-sw/cmssw @.> Cc: Sunanda Banerjee @.>; Mention @.> Subject: Re: [cms-sw/cmssw] Exception from HGCalBackendLayer2Producer (Issue #41451)

@bsunandahttps://github.com/bsunanda @cms-sw/hgcal-dpg-l2https://github.com/orgs/cms-sw/teams/hgcal-dpg-l2 gentle ping.

— Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cmssw/issues/41451#issuecomment-1825414112, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGMZOXTJZVIHTJTBGZMP7LYGBVEZAVCNFSM6AAAAAAXOKLGHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVGQYTIMJRGI. You are receiving this because you were mentioned.Message ID: @.***>

bsunanda avatar Jan 18 '24 02:01 bsunanda

Workflow 25234.911 (step2 of it), any build. Likely requires multiple threads (at least I don't recall seeing the exception in PR tests). The exception occurs randomly, likely requires multiple attempts to reproduce, possibly on a loaded machine.

makortel avatar Jan 18 '24 14:01 makortel

We are still seeing this exception, and there's some circumstantial evidence that this issue and #42025 could be related.

dan131riley avatar Apr 09 '24 12:04 dan131riley

New occurrence of this failure in RelVal 25234.911 for el8_amd64_gcc12 in CMSSW_14_1_X_2024-05-19-2300 IBs:

----- Begin Fatal Exception 20-May-2024 01:32:28 CEST-----------------------
An exception of category 'OutOfBound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 3
   [1] Running path 'HLTriggerFinalPath'
   [2] Prefetching for module TriggerSummaryProducerAOD/'hltTriggerSummaryAOD'
   [3] Prefetching for module L1HPSPFTauProducer/'l1tHPSPFTauProducer'
   [4] Prefetching for module L1TPFCandMultiMerger/'l1tLayer1'
   [5] Prefetching for module L1TCorrelatorLayer1Producer/'l1tLayer1HGCal'
   [6] Prefetching for module PFClusterProducerFromHGC3DClusters/'l1tPFClustersFromHGC3DClusters'
   [7] Calling method for module HGCalBackendLayer2Producer/'l1tHGCalBackEndLayer2Producer'
Exception Message:
TC X1 = inf out of the seeding histogram bounds 0.076 - 0.58
----- End Fatal Exception -------------------------------------------------

aandvalenzuela avatar May 21 '24 08:05 aandvalenzuela

The same issue appears in D110 DD4hep, https://github.com/cms-sw/cmssw/pull/45175#issuecomment-2183429334

srimanob avatar Jun 22 '24 00:06 srimanob

A strange way to reproduce the issue, https://github.com/cms-sw/cmssw/issues/41927#issuecomment-2211842526

srimanob avatar Jul 06 '24 18:07 srimanob

https://github.com/cms-sw/cmssw/issues/41927#issuecomment-2227115997

srimanob avatar Jul 13 '24 22:07 srimanob

This one got fixed with https://github.com/cms-sw/cmssw/pull/45442 as well.

makortel avatar Jul 19 '24 21:07 makortel