[PY312/nVidia T4] Multiple RelVals failed with cudaErrorLaunchOutOfResources
The following RelVals failed in CMSSW_16_0_PY312_X_2025-12-08-2300 with nVidia T4 : 160.03502, 17034.402, 17034.403, 17034.406, 17034.412, 17034.422, 17034.423, 18434.402, 18434.403, 18434.404, 18434.406, 18434.407, 18434.408, 18434.412, 18434.413, 18434.422, 18434.423, 18434.424, 18450.402, 18450.403, 18450.404, 18450.406, 18450.407, 18450.408, 18461.402, 18634.402, 18634.403, 18634.404, 18634.406, 18634.407, 18634.408, 18634.412, 18634.413, 18634.422, 18634.423, 18634.424, 18650.402, 18650.403, 18650.404, 18650.406, 18650.407, 18650.408, 18661.402
Example stack traces:
- RelVal 160.03502
----- Begin Fatal Exception 09-Dec-2025 07:56:51 CET-----------------------
An exception of category 'StdException' occurred while
[0] Processing Event run: 1 lumi: 1 event: 1 stream: 3
[1] Running path 'dqmofflineOnPAT_1_step'
[2] Prefetching for module SingleTopTChannelLeptonDQM_miniAOD/'singleTopElectronMediumDQM_miniAOD'
[3] Prefetching for module PATMuonSlimmer/'slimmedMuons'
[4] Prefetching for module PATMuonSelector/'selectedPatMuons'
[5] Prefetching for module PATMuonProducer/'patMuons'
[6] Prefetching for module MuonProducer/'muons'
[7] Prefetching for module PFProducer/'particleFlowTmp'
[8] Prefetching for module PFBlockProducer/'particleFlowBlock'
[9] Prefetching for module PFElecTkProducer/'pfTrackElec'
[10] Prefetching for module GsfTrackProducer/'electronGsfTracks'
[11] Prefetching for module CkfTrackCandidateMaker/'electronCkfTrackCandidates'
[12] Prefetching for module ElectronSeedMerger/'electronMergedSeeds'
[13] Prefetching for module GoodSeedProducer/'trackerDrivenElectronSeeds'
[14] Prefetching for module PFMultiDepthClusterProducer/'particleFlowClusterHCAL'
[15] Prefetching for module LegacyPFClusterProducer/'legacyPFClusterProducer'
[16] Prefetching for module PFClusterSoAProducer@alpaka/'pfClusterSoAProducer'
[17] Calling method for module PFClusterSoAProducer@alpaka/'pfClusterSoAProducer'
Exception Message:
A std::exception was thrown.
/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc13/external/alpaka/2.0.0-8493f1d11d0378dc14d6ea6ecfc69ac5/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp(275) 'TApi::setDevice(queue.m_spQueueImpl->m_dev.getNativeHandle())' A previous API call (not this one) set the error : 'cudaErrorLaunchOutOfResources': 'too many resources requested for launch'!
----- End Fatal Exception -------------------------------------------------
- Other RelVals:
----- Begin Fatal Exception 09-Dec-2025 07:51:55 CET-----------------------
An exception of category 'StdException' occurred while
[0] Processing Event run: 1 lumi: 4 event: 304 stream: 0
[1] Running path 'DQM_HcalReconstruction_v11'
[2] Calling method for module PFClusterSoAProducer@alpaka/'hltParticleFlowClusterHBHESoA'
Exception Message:
A std::exception was thrown.
/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc13/external/alpaka/2.0.0-8493f1d11d0378dc14d6ea6ecfc69ac5/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp(275) 'TApi::setDevice(queue.m_spQueueImpl->m_dev.getNativeHandle())' A previous API call (not this one) set the error : 'cudaErrorLaunchOutOfResources': 'too many resources requested for launch'!
----- End Fatal Exception -------------------------------------------------
cms-bot internal usage
A new Issue was created by @iarspider.
@Dr15Jones, @ftenchini, @makortel, @mandrenguyen, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign heterogeneous
New categories assigned: heterogeneous
@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks
What is different in this build with respect to the regular ones ?
Python version (3.12)
How does that affect C++ and CUDA code ?