cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

[DBG_X] RelVals 2xx34.x failing with SIGSEGV in HGCalDDDConstants::assignCellTrap

Open iarspider opened this issue 6 months ago • 8 comments

Full logs:

Example stack trace:

Thread 3 (Thread 0x154f2e57f700 (LWP 510986) "cmsRun"):
#0  0x0000154f6e75bba1 in poll () from /lib64/libc.so.6
#1  0x0000154f69b3ff67 in (anonymous namespace)::full_read (len=1, timeout_s=300, inbuf=0x154f2e578746 "[", fd=7) at src/FWCore/Services/plugins/InitRootHandlers.cc:380
#2  edm::service::InitRootHandlers::stacktraceFromThread () at src/FWCore/Services/plugins/InitRootHandlers.cc:700
#3  0x0000154f69b40164 in (anonymous namespace)::sig_dostack_then_abort (sig=11) at src/FWCore/Services/plugins/InitRootHandlers.cc:546
#4  <signal handler called>
#5  0x0000154f43e3d88d in HGCalDDDConstants::assignCellTrap (this=0x154f333c0800, x=<optimized out>, y=<optimized out>, z=<optimized out>, layer=<optimized out>, layer@entry=12, reco=<optimized out>, reco@entry=false) at src/Geometry/HGCalCommonData/src/HGCalDDDConstants.cc:290
#6  0x0000154f31792514 in HGCalNumberingScheme::getUnitID (this=0x154f2d48df80, layer=<optimized out>, layer@entry=12, module=<optimized out>, module@entry=-1, cell=<optimized out>, cell@entry=-1, iz=<optimized out>, iz@entry=-1, pos=..., wt=@0x154f2d4cd520: 1) at src/SimG4CMS/Calo/src/HGCalNumberingScheme.cc:159
#7  0x0000154f31793160 in HGCScintSD::setDetUnitId (this=this@entry=0x154f2d4cd000, layer=layer@entry=12, module=module@entry=-1, cell=cell@entry=-1, iz=iz@entry=-1, pos=...) at src/SimG4CMS/Calo/src/HGCScintSD.cc:265
#8  0x0000154f317940fc in HGCScintSD::setDetUnitId (this=0x154f2d4cd000, aStep=0x154f2d52fe40) at src/SimG4CMS/Calo/src/HGCScintSD.cc:199
#9  0x0000154f31738836 in CaloSD::ProcessHits (this=0x154f2d4cd000, aStep=0x154f2d52fe40) at src/SimG4CMS/Calo/src/CaloSD.cc:236
#10 0x0000154f319438e3 in G4VSensitiveDetector::Hit (this=0x154f2d4cd000, aStep=0x154f2d52fe40) at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/geant4/11.2.2-6e3629ff84ce7decb627e4aed88b5a3f/include/Geant4/G4VSensitiveDetector.hh:92
#11 0x0000154f3244b508 in G4HepEmTrackingManager::TrackElectron (this=0x154f2e579600, aTrack=0x154efe3c8910) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/g4hepem/20250220-42b00cccd3462627716bc308fcd81250/g4hepem.20250220/G4HepEm/G4HepEm/src/G4HepEmTrackingManager.cc:388
#12 0x0000154f32454d8b in G4HepEmTrackingManager::HandOverOneTrack (this=0x154f2d7e7900, aTrack=0x154efe3c8910) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/g4hepem/20250220-42b00cccd3462627716bc308fcd81250/g4hepem.20250220/G4HepEm/G4HepEm/src/G4HepEmTrackingManager.cc:1213
#13 0x0000154f3196db2f in RunManagerMTWorker::produce (this=0x154f2d629000, inpevt=..., es=..., runManagerMaster=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/geant4/11.2.2-6e3629ff84ce7decb627e4aed88b5a3f/geant4.11.2.2/source/event/src/G4EventManager.cc:192

iarspider avatar Jun 17 '25 14:06 iarspider

assign geometry

iarspider avatar Jun 17 '25 14:06 iarspider

New categories assigned: geometry

@bsunanda,@civanch,@Dr15Jones,@kpedro88,@makortel,@mdhildreth you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Jun 17 '25 14:06 cmsbuild

cms-bot internal usage

cmsbuild avatar Jun 17 '25 14:06 cmsbuild

A new Issue was created by @iarspider.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Jun 17 '25 14:06 cmsbuild

Since which IB all these failures start occurring?

bsunanda avatar Jun 17 '25 14:06 bsunanda

CMSSW_15_1_DBG_X_2025-06-02-2300

iarspider avatar Jun 17 '25 14:06 iarspider

Maybe even CMSSW_15_1_DBG_X_2025-05-26-2300

iarspider avatar Jun 17 '25 14:06 iarspider

@bsunanda @cms-sw/geometry-l2 gentle ping

iarspider avatar Jun 24 '25 08:06 iarspider

RelVals are still failing.

akritkbehera avatar Jul 08 '25 09:07 akritkbehera

It looks like the mentioned RelVals are still encountering failures in the IBs for CMSSW_15_1_DBG_X_2025-08-11-2300. Failing RelVals

akritkbehera avatar Aug 12 '25 12:08 akritkbehera

From OpenSearch results, I see that first time workflow 24834.911 failed was on July 5 [a] with error [b]. The changes between CMSSW_15_1_DBG_X_2025-07-02-2300 and CMSSW_15_1_DBG_X_2025-07-04-2300 IBs should be https://github.com/cms-sw/cmssw/compare/936a77bd53cdc9a63de9d54efb1aa4239ec1dd13...7728310c2d3a472ea8eb3fce0d1c504617e6030d . There were no changes in externals for these IBs.

@bsunanda , does these cmssw change show any obvious reason why relvals are failing for GDB IBs?

[a]

IB Workflow step exit code
CMSSW_15_1_DBG_X_2025-07-04-2300 24834.911 step1 62,720
CMSSW_15_1_DBG_X_2025-07-02-2300 24834.911 step1 0
CMSSW_15_1_DBG_X_2025-06-30-2300 24834.911 step1 0

[b]

#4  <signal handler called>
#5  0x00001526fb5c38cd in HGCalDDDConstants::assignCellTrap (this=0x1526f7c2b180, x=<optimized out>, y=<optimized out>, z=<optimized out>, layer=<optimized out>, layer@entry=17, reco=<optimized out>, reco@entry=false) at src/Geometry/HGCalCommonData/src/HGCalDDDConstants.cc:290
#6  0x00001526e9784cc4 in HGCalNumberingScheme::getUnitID (this=0x1526b4e50d80, layer=<optimized out>, layer@entry=17, module=<optimized out>, module@entry=-1, cell=<optimized out>, cell@entry=-1, iz=<optimized out>, iz@entry=-1, pos=..., wt=@0x1526e5221520: 1) at src/SimG4CMS/Calo/src/HGCalNumberingScheme.cc:159
#7  0x00001526e9785910 in HGCScintSD::setDetUnitId (this=this@entry=0x1526e5221000, layer=layer@entry=17, module=module@entry=-1, cell=cell@entry=-1, iz=iz@entry=-1, pos=...) at src/SimG4CMS/Calo/src/HGCScintSD.cc:265
#8  0x00001526e97867fc in HGCScintSD::setDetUnitId (this=0x1526e5221000, aStep=0x1526e52478a0) at src/SimG4CMS/Calo/src/HGCScintSD.cc:199
#9  0x00001526e9730496 in CaloSD::ProcessHits (this=0x1526e5221000, aStep=0x1526e52478a0) at src/SimG4CMS/Calo/src/CaloSD.cc:236
#10 0x00001526e992cec3 in G4VSensitiveDetector::Hit (this=0x1526e5221000, aStep=0x1526e52478a0) at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/geant4/11.2.2-617017a30fb4a3cd9034a3489a6e2d77/include/Geant4/G4VSensitiveDetector.hh:92

smuzaffar avatar Aug 13 '25 14:08 smuzaffar

@cms-sw/geometry-l2 any progress with this failure?

iarspider avatar Sep 23 '25 07:09 iarspider

The failure happens on this line:

https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalDDDConstants.cc#L290

Here, indx.first is 12, but hgpar_->iradMaxBHFine_ vector is empty. This vector, as far as I see, is filled here:

https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalGeomParameters.cc#L2438

using contents of tileRingFineRange_ vector, which is also empty, and here

https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalGeomParameters.cc#L2532

(that code is skipped if this condition is false: https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalGeomParameters.cc#L2394-L2395)

tileRingFineRange_ is filled here: https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalGeomParameters.cc#L2005

tileRing{Min,Max}Fine are filled here: https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalGeomParameters.cc#L1683-L1684

which is never reached.

iarspider avatar Sep 23 '25 12:09 iarspider

@cms-sw/geometry-l2 any progress with this failure? The mentioned RelVals are still failing on CMSSW_16_0_DBG_X_2025-09-29-2300

akritkbehera avatar Sep 30 '25 11:09 akritkbehera

@cms-sw/geometry-l2 ping!

iarspider avatar Oct 07 '25 15:10 iarspider

@cms-sw/geometry-l2 ping!!

iarspider avatar Oct 21 '25 13:10 iarspider

Please try #49275 and see if this can cure the SIGSEGV issues

bsunanda avatar Oct 31 '25 18:10 bsunanda

Can one repeat the tests to see if the same trend continues to exist?

bsunanda avatar Nov 04 '25 17:11 bsunanda

No failures seen in CMSSW_16_0_DBG_X_2025-11-03-2300, closing the issue.

iarspider avatar Nov 04 '25 17:11 iarspider