[DBG_X] RelVals 2xx34.x failing with SIGSEGV in HGCalDDDConstants::assignCellTrap
Full logs:
- Relval 25234.0 step 1
- Relval 26834.0 step 1
- Relval 28434.0 step 1
- Relval 29234.0 step 1
- Relval 29634.911 step 1
Example stack trace:
Thread 3 (Thread 0x154f2e57f700 (LWP 510986) "cmsRun"):
#0 0x0000154f6e75bba1 in poll () from /lib64/libc.so.6
#1 0x0000154f69b3ff67 in (anonymous namespace)::full_read (len=1, timeout_s=300, inbuf=0x154f2e578746 "[", fd=7) at src/FWCore/Services/plugins/InitRootHandlers.cc:380
#2 edm::service::InitRootHandlers::stacktraceFromThread () at src/FWCore/Services/plugins/InitRootHandlers.cc:700
#3 0x0000154f69b40164 in (anonymous namespace)::sig_dostack_then_abort (sig=11) at src/FWCore/Services/plugins/InitRootHandlers.cc:546
#4 <signal handler called>
#5 0x0000154f43e3d88d in HGCalDDDConstants::assignCellTrap (this=0x154f333c0800, x=<optimized out>, y=<optimized out>, z=<optimized out>, layer=<optimized out>, layer@entry=12, reco=<optimized out>, reco@entry=false) at src/Geometry/HGCalCommonData/src/HGCalDDDConstants.cc:290
#6 0x0000154f31792514 in HGCalNumberingScheme::getUnitID (this=0x154f2d48df80, layer=<optimized out>, layer@entry=12, module=<optimized out>, module@entry=-1, cell=<optimized out>, cell@entry=-1, iz=<optimized out>, iz@entry=-1, pos=..., wt=@0x154f2d4cd520: 1) at src/SimG4CMS/Calo/src/HGCalNumberingScheme.cc:159
#7 0x0000154f31793160 in HGCScintSD::setDetUnitId (this=this@entry=0x154f2d4cd000, layer=layer@entry=12, module=module@entry=-1, cell=cell@entry=-1, iz=iz@entry=-1, pos=...) at src/SimG4CMS/Calo/src/HGCScintSD.cc:265
#8 0x0000154f317940fc in HGCScintSD::setDetUnitId (this=0x154f2d4cd000, aStep=0x154f2d52fe40) at src/SimG4CMS/Calo/src/HGCScintSD.cc:199
#9 0x0000154f31738836 in CaloSD::ProcessHits (this=0x154f2d4cd000, aStep=0x154f2d52fe40) at src/SimG4CMS/Calo/src/CaloSD.cc:236
#10 0x0000154f319438e3 in G4VSensitiveDetector::Hit (this=0x154f2d4cd000, aStep=0x154f2d52fe40) at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/geant4/11.2.2-6e3629ff84ce7decb627e4aed88b5a3f/include/Geant4/G4VSensitiveDetector.hh:92
#11 0x0000154f3244b508 in G4HepEmTrackingManager::TrackElectron (this=0x154f2e579600, aTrack=0x154efe3c8910) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/g4hepem/20250220-42b00cccd3462627716bc308fcd81250/g4hepem.20250220/G4HepEm/G4HepEm/src/G4HepEmTrackingManager.cc:388
#12 0x0000154f32454d8b in G4HepEmTrackingManager::HandOverOneTrack (this=0x154f2d7e7900, aTrack=0x154efe3c8910) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/g4hepem/20250220-42b00cccd3462627716bc308fcd81250/g4hepem.20250220/G4HepEm/G4HepEm/src/G4HepEmTrackingManager.cc:1213
#13 0x0000154f3196db2f in RunManagerMTWorker::produce (this=0x154f2d629000, inpevt=..., es=..., runManagerMaster=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/geant4/11.2.2-6e3629ff84ce7decb627e4aed88b5a3f/geant4.11.2.2/source/event/src/G4EventManager.cc:192
assign geometry
New categories assigned: geometry
@bsunanda,@civanch,@Dr15Jones,@kpedro88,@makortel,@mdhildreth you have been requested to review this Pull request/Issue and eventually sign? Thanks
cms-bot internal usage
A new Issue was created by @iarspider.
@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
Since which IB all these failures start occurring?
CMSSW_15_1_DBG_X_2025-06-02-2300
Maybe even CMSSW_15_1_DBG_X_2025-05-26-2300
@bsunanda @cms-sw/geometry-l2 gentle ping
RelVals are still failing.
It looks like the mentioned RelVals are still encountering failures in the IBs for CMSSW_15_1_DBG_X_2025-08-11-2300. Failing RelVals
From OpenSearch results, I see that first time workflow 24834.911 failed was on July 5 [a] with error [b]. The changes between CMSSW_15_1_DBG_X_2025-07-02-2300 and CMSSW_15_1_DBG_X_2025-07-04-2300 IBs should be https://github.com/cms-sw/cmssw/compare/936a77bd53cdc9a63de9d54efb1aa4239ec1dd13...7728310c2d3a472ea8eb3fce0d1c504617e6030d . There were no changes in externals for these IBs.
@bsunanda , does these cmssw change show any obvious reason why relvals are failing for GDB IBs?
[a]
| IB | Workflow | step | exit code |
|---|---|---|---|
| CMSSW_15_1_DBG_X_2025-07-04-2300 | 24834.911 | step1 | 62,720 |
| CMSSW_15_1_DBG_X_2025-07-02-2300 | 24834.911 | step1 | 0 |
| CMSSW_15_1_DBG_X_2025-06-30-2300 | 24834.911 | step1 | 0 |
[b]
#4 <signal handler called>
#5 0x00001526fb5c38cd in HGCalDDDConstants::assignCellTrap (this=0x1526f7c2b180, x=<optimized out>, y=<optimized out>, z=<optimized out>, layer=<optimized out>, layer@entry=17, reco=<optimized out>, reco@entry=false) at src/Geometry/HGCalCommonData/src/HGCalDDDConstants.cc:290
#6 0x00001526e9784cc4 in HGCalNumberingScheme::getUnitID (this=0x1526b4e50d80, layer=<optimized out>, layer@entry=17, module=<optimized out>, module@entry=-1, cell=<optimized out>, cell@entry=-1, iz=<optimized out>, iz@entry=-1, pos=..., wt=@0x1526e5221520: 1) at src/SimG4CMS/Calo/src/HGCalNumberingScheme.cc:159
#7 0x00001526e9785910 in HGCScintSD::setDetUnitId (this=this@entry=0x1526e5221000, layer=layer@entry=17, module=module@entry=-1, cell=cell@entry=-1, iz=iz@entry=-1, pos=...) at src/SimG4CMS/Calo/src/HGCScintSD.cc:265
#8 0x00001526e97867fc in HGCScintSD::setDetUnitId (this=0x1526e5221000, aStep=0x1526e52478a0) at src/SimG4CMS/Calo/src/HGCScintSD.cc:199
#9 0x00001526e9730496 in CaloSD::ProcessHits (this=0x1526e5221000, aStep=0x1526e52478a0) at src/SimG4CMS/Calo/src/CaloSD.cc:236
#10 0x00001526e992cec3 in G4VSensitiveDetector::Hit (this=0x1526e5221000, aStep=0x1526e52478a0) at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/geant4/11.2.2-617017a30fb4a3cd9034a3489a6e2d77/include/Geant4/G4VSensitiveDetector.hh:92
@cms-sw/geometry-l2 any progress with this failure?
The failure happens on this line:
https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalDDDConstants.cc#L290
Here, indx.first is 12, but hgpar_->iradMaxBHFine_ vector is empty. This vector, as far as I see, is filled here:
https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalGeomParameters.cc#L2438
using contents of tileRingFineRange_ vector, which is also empty, and here
https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalGeomParameters.cc#L2532
(that code is skipped if this condition is false: https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalGeomParameters.cc#L2394-L2395)
tileRingFineRange_ is filled here: https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalGeomParameters.cc#L2005
tileRing{Min,Max}Fine are filled here: https://github.com/cms-sw/cmssw/blob/55976f2b6ecff711c60b82dac28d1d4951ad0494/Geometry/HGCalCommonData/src/HGCalGeomParameters.cc#L1683-L1684
which is never reached.
@cms-sw/geometry-l2 any progress with this failure? The mentioned RelVals are still failing on
CMSSW_16_0_DBG_X_2025-09-29-2300
@cms-sw/geometry-l2 ping!
@cms-sw/geometry-l2 ping!!
Please try #49275 and see if this can cure the SIGSEGV issues
Can one repeat the tests to see if the same trend continues to exist?
No failures seen in CMSSW_16_0_DBG_X_2025-11-03-2300, closing the issue.