cmssw
cmssw copied to clipboard
New Heterogeneous Memory Pool
This PR replaces the old "notcub" cache allocator with a memory pool featuring
lockfree operations
backend agnostic implementation
The data interface is based on a simple Buffer that is completely backend agnostic
The allocation interface (makeBuffer) currently depends on cudaStream_t that can be easily hidden behind void *
or a light opaque struct
A new feature is a "Bundle deleter": buffers can be bundle together and then freed in just one operation: this reduces the number of cuda calls.
All previous users of the cache allocator (at least for Pixel wf) have been migrated.
Tests passes: it is not slower than previous implementation. Need a free machine to make definitive tests.
Some cleanup is still required to remove debug statements.
Purely technical no regression expected.
Draft Slides for a possible presentation available @ https://cernbox.cern.ch/index.php/s/Ax4NHYGLHbG8N1C
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37952/30020
-
This PR adds an extra 232KB to repository
-
Found files with invalid states:
- HeterogeneousCore/CUDAUtilities/src/cudaMemoryPool.cu:
- Added: f37385a552b3045133ef6c42fef34d1f1e7d737a
- Modified: 75ca0dbe0f56fffb8998d35429c6978f5b461505, c429d13bf12445972e07f1a107b28ee63931fb88, c5e35f0ee22d34bafdf47a70b7e6c010eb52801f
- Deleted: 21a646e3b8568eaed4d67f66f87c6f866c1754a6
- CUDADataFormats/TrackingRecHit/interface/TrackingRecHit2DHeterogeneousImpl.h:
- Added: 5291489f7c512567e2ea7a2c2492ed769113cdbb
- Modified: 1a43ba74652dfe596df6e440e3d58705d60342c1, e7d86325fee62412599be075901712ea5dd78571, c8d553a8399fbe6335428bd05524b7ca6842c5f2
- Deleted: 521d4c0ba8cb2fddb313d4aec0448c32b8663780
- HeterogeneousCore/CUDAUtilities/interface/cudaMemoryPoolImpl.h:
- Added: 5291489f7c512567e2ea7a2c2492ed769113cdbb
- Modified: 1a43ba74652dfe596df6e440e3d58705d60342c1, 59bcb2be4ef6e9932face966de0a60c914d5a8ed, 29df6e20a122c8328831f9d2594e8630d9f43a45, 849da8c5c5aeab4d4bf77ecd8daead996dfe4da7, e7d86325fee62412599be075901712ea5dd78571, b4f4d467756c6b8373a46a1d22a654bd54c4e742, 8b149edfa447eec39e9548d1f32f0eb0df384d2c
- Deleted: 1487b8809a1ec08bea1eb831cb3a0de6d545ee45
- CUDADataFormats/SiPixelDigi/interface/SiPixelDigisCUDAImpl.h:
- Added: 3ae45f7de8cd397b8b61b0430e4f402839a7dbbc
- Modified: b4f4d467756c6b8373a46a1d22a654bd54c4e742, c8d553a8399fbe6335428bd05524b7ca6842c5f2
- Deleted: 9402cb72c38ad7e988131bdbf3cf8bd1bdfde11b
- CUDADataFormats/TrackingRecHit/src/TrackingRecHit2DHeterogeneous.cc:
- Modified: 6b050bde0222f1b9eea7cb60d96e16320b4e9364, 0e49a367337570dd2e867ee929420c4e29b288ea, e8e9c0ff9d0939da0a325e82aea15ed2941a6f02, 88da3bc1c3f19de3c87ee87667d77d6ac7c35e85, b4f4d467756c6b8373a46a1d22a654bd54c4e742, 521d4c0ba8cb2fddb313d4aec0448c32b8663780
- Deleted: 21a646e3b8568eaed4d67f66f87c6f866c1754a6
- Added: e7d86325fee62412599be075901712ea5dd78571
- HeterogeneousCore/CUDAUtilities/src/cudaMemoryPool.cu:
-
There are other open Pull requests which might conflict with changes you have proposed:
- File HeterogeneousCore/CUDAServices/src/CUDAService.cc modified in PR(s): #37831
- File HeterogeneousCore/CUDAUtilities/test/BuildFile.xml modified in PR(s): #35713
- File RecoLocalTracker/SiPixelRecHits/plugins/PixelRecHitGPUKernel.cu modified in PR(s): #35713
- File RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromLegacy.cc modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cu modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.h modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernelsAlloc.cc modified in PR(s): #35713
A new Pull Request was created by @VinInn (Vincenzo Innocente) for master.
It involves the following packages:
- CUDADataFormats/BeamSpot (heterogeneous, reconstruction)
- CUDADataFormats/Common (heterogeneous)
- CUDADataFormats/SiPixelDigi (heterogeneous, reconstruction)
- CUDADataFormats/Track (heterogeneous, reconstruction)
- CUDADataFormats/TrackingRecHit (heterogeneous, reconstruction)
- CUDADataFormats/Vertex (heterogeneous, reconstruction)
- EventFilter/SiPixelRawToDigi (reconstruction)
- HeterogeneousCore/CUDACore (heterogeneous)
- HeterogeneousCore/CUDAServices (heterogeneous)
- HeterogeneousCore/CUDAUtilities (heterogeneous)
- RecoLocalTracker/SiPixelRecHits (reconstruction)
- RecoPixelVertexing/PixelTrackFitting (reconstruction)
- RecoPixelVertexing/PixelTriplets (reconstruction)
- RecoPixelVertexing/PixelVertexFinding (reconstruction)
- RecoVertex/BeamSpotProducer (reconstruction, alca)
@malbouis, @yuanchao, @makortel, @slava77, @clacaputo, @cmsbuild, @fwyzard, @jpata, @tvami, @francescobrivio can you please review it and eventually sign? Thanks. @tvami, @makortel, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @Martin-Grunewald, @missirol, @OzAmram, @tocheng, @ferencek, @mtosi, @gpetruc, @mmusich, @dkotlins, @threus, @dgulhan, @francescobrivio this is something you requested to watch as well. @perrotta, @dpiparo, @qliphy you are the release manager for this.
cms-bot commands are listed here
@cmsbuild , please test
enable gpu
-1
Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-651b42/24728/summary.html
COMMIT: b8d0837f4924cb88f991b367d2ffbec85e631b7f
CMSSW: CMSSW_12_4_X_2022-05-15-0000/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/37952/24728/install.sh
to create a dev area with all the needed externals and cmssw changes.
Unit Tests
I found errors in the following unit tests:
---> test cpuVertexFinderByDensity_t had ERRORS ---> test cpuVertexFinderIterative_t had ERRORS
GPU Comparison Summary
Summary:
- No significant changes to the logs found
- Reco comparison results: 24 differences found in the comparisons
- DQMHistoTests: Total files compared: 4
- DQMHistoTests: Total histograms compared: 19874
- DQMHistoTests: Total failures: 1171
- DQMHistoTests: Total nulls: 1
- DQMHistoTests: Total successes: 18702
- DQMHistoTests: Total skipped: 0
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
- Checked 12 log files, 9 edm output root files, 4 DQM output files
- TriggerResults: found differences in 3 / 3 workflows
Comparison Summary
@slava77 comparisons for the following workflows were not done due to missing matrix map:
- /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-651b42/11634.301_TTbar_14TeV+2021_Run3FS+TTbar_14TeV_TuneCP5_GenSim+HARVESTNano
Summary:
- No significant changes to the logs found
- Reco comparison results: 2 differences found in the comparisons
- DQMHistoTests: Total files compared: 50
- DQMHistoTests: Total histograms compared: 3741432
- DQMHistoTests: Total failures: 92
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3741318
- DQMHistoTests: Total skipped: 22
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
- Checked 208 log files, 45 edm output root files, 50 DQM output files
- TriggerResults: no differences found
@cmsbuild , please test
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37952/30028
-
This PR adds an extra 236KB to repository
-
Found files with invalid states:
- HeterogeneousCore/CUDAUtilities/src/cudaMemoryPool.cu:
- Added: f37385a552b3045133ef6c42fef34d1f1e7d737a
- Modified: 75ca0dbe0f56fffb8998d35429c6978f5b461505, c429d13bf12445972e07f1a107b28ee63931fb88, c5e35f0ee22d34bafdf47a70b7e6c010eb52801f
- Deleted: 21a646e3b8568eaed4d67f66f87c6f866c1754a6
- CUDADataFormats/TrackingRecHit/interface/TrackingRecHit2DHeterogeneousImpl.h:
- Added: 5291489f7c512567e2ea7a2c2492ed769113cdbb
- Modified: 1a43ba74652dfe596df6e440e3d58705d60342c1, e7d86325fee62412599be075901712ea5dd78571, c8d553a8399fbe6335428bd05524b7ca6842c5f2
- Deleted: 521d4c0ba8cb2fddb313d4aec0448c32b8663780
- HeterogeneousCore/CUDAUtilities/interface/cudaMemoryPoolImpl.h:
- Added: 5291489f7c512567e2ea7a2c2492ed769113cdbb
- Modified: 1a43ba74652dfe596df6e440e3d58705d60342c1, 59bcb2be4ef6e9932face966de0a60c914d5a8ed, 29df6e20a122c8328831f9d2594e8630d9f43a45, 849da8c5c5aeab4d4bf77ecd8daead996dfe4da7, e7d86325fee62412599be075901712ea5dd78571, b4f4d467756c6b8373a46a1d22a654bd54c4e742, 8b149edfa447eec39e9548d1f32f0eb0df384d2c
- Deleted: 1487b8809a1ec08bea1eb831cb3a0de6d545ee45
- CUDADataFormats/SiPixelDigi/interface/SiPixelDigisCUDAImpl.h:
- Added: 3ae45f7de8cd397b8b61b0430e4f402839a7dbbc
- Modified: b4f4d467756c6b8373a46a1d22a654bd54c4e742, c8d553a8399fbe6335428bd05524b7ca6842c5f2
- Deleted: 9402cb72c38ad7e988131bdbf3cf8bd1bdfde11b
- CUDADataFormats/TrackingRecHit/src/TrackingRecHit2DHeterogeneous.cc:
- Modified: 6b050bde0222f1b9eea7cb60d96e16320b4e9364, 0e49a367337570dd2e867ee929420c4e29b288ea, e8e9c0ff9d0939da0a325e82aea15ed2941a6f02, 88da3bc1c3f19de3c87ee87667d77d6ac7c35e85, b4f4d467756c6b8373a46a1d22a654bd54c4e742, 521d4c0ba8cb2fddb313d4aec0448c32b8663780
- Deleted: 21a646e3b8568eaed4d67f66f87c6f866c1754a6
- Added: e7d86325fee62412599be075901712ea5dd78571
- HeterogeneousCore/CUDAUtilities/src/cudaMemoryPool.cu:
-
There are other open Pull requests which might conflict with changes you have proposed:
- File HeterogeneousCore/CUDAServices/src/CUDAService.cc modified in PR(s): #37831
- File HeterogeneousCore/CUDAUtilities/test/BuildFile.xml modified in PR(s): #35713
- File RecoLocalTracker/SiPixelRecHits/plugins/PixelRecHitGPUKernel.cu modified in PR(s): #35713
- File RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromLegacy.cc modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cu modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.h modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernelsAlloc.cc modified in PR(s): #35713
Pull request #37952 was updated. @malbouis, @yuanchao, @makortel, @slava77, @clacaputo, @fwyzard, @jpata, @tvami, @francescobrivio can you please check and sign again.
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-651b42/24737/summary.html
COMMIT: 574cca4c553978aa1ea6919b2a41ac5c2f69a8bb
CMSSW: CMSSW_12_4_X_2022-05-15-2300/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/37952/24737/install.sh
to create a dev area with all the needed externals and cmssw changes.
GPU Comparison Summary
Summary:
- No significant changes to the logs found
- Reco comparison results: 24 differences found in the comparisons
- DQMHistoTests: Total files compared: 4
- DQMHistoTests: Total histograms compared: 19874
- DQMHistoTests: Total failures: 1172
- DQMHistoTests: Total nulls: 1
- DQMHistoTests: Total successes: 18701
- DQMHistoTests: Total skipped: 0
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
- Checked 12 log files, 9 edm output root files, 4 DQM output files
- TriggerResults: found differences in 3 / 3 workflows
Comparison Summary
@slava77 comparisons for the following workflows were not done due to missing matrix map:
- /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-651b42/11634.301_TTbar_14TeV+2021_Run3FS+TTbar_14TeV_TuneCP5_GenSim+HARVESTNano
Summary:
- No significant changes to the logs found
- Reco comparison results: 0 differences found in the comparisons
- DQMHistoTests: Total files compared: 50
- DQMHistoTests: Total histograms compared: 3741432
- DQMHistoTests: Total failures: 86
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3741324
- DQMHistoTests: Total skipped: 22
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
- Checked 208 log files, 45 edm output root files, 50 DQM output files
- TriggerResults: no differences found
Sent from my iPhone
On May 18, 2022, at 09:30, Tamas Vami @.***> wrote:
@tvami commented on this pull request.
In CUDADataFormats/BeamSpot/interface/BeamSpotCUDA.h:
class BeamSpotCUDA { public:
- using Buffer = memoryPool::Buffer<BeamSpotPOD>; Hi @VinInn isnt this technically a namespace? According to rule 2.7 those should start with a lowercase letter
This is a class alias (aka typedef) Will address the other comments later in the week V.
—
Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
again in needs of rebase
@cmsbuild , please test
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37952/30115
-
This PR adds an extra 224KB to repository
-
Found files with invalid states:
- HeterogeneousCore/CUDAUtilities/src/cudaMemoryPool.cu:
- Added: 39b1a121bb3430def21c7930f5f0134c6d946f4e
- Modified: 26a67001f0ee4157081e10478b523d1474c5d409, 8e8a1abbaf6937065bdfc54ef9bbf04ff6f2c128, ead220d0ed2821d8130af78445408eff8db6392b
- Deleted: eea72dc3319c5295ef2f54c6a2e71c28a07887be
- CUDADataFormats/TrackingRecHit/interface/TrackingRecHit2DHeterogeneousImpl.h:
- Added: f19e812de7b074c55d64d3295fc02eb63cdac1eb
- Modified: 6f140eb5c98abc7f56646ca6c7f57ed867f06b91, d7aa3b39d17b315adce463edcf74f4cd10b17fbf, c7256ea46632c4c4bfa9ca8e82914e5ada53df29
- Deleted: 0920241ed10a9d14c410fdc81a73eff10d881cbf
- HeterogeneousCore/CUDAUtilities/interface/cudaMemoryPoolImpl.h:
- Added: f19e812de7b074c55d64d3295fc02eb63cdac1eb
- Modified: 6f140eb5c98abc7f56646ca6c7f57ed867f06b91, 67500afbd34e1ac947db321209885a91d0f989f4, 12d2efc459e81a784718ead1c585b4cba23489ad, d32b122d44b489586fbeb3c727cea2857d26032a, d7aa3b39d17b315adce463edcf74f4cd10b17fbf, b42eeaefe5c0add3ed6fc5fd59fd650f96922914, 1311dc48ac0598e5b50127dd595cf13c10940f0d
- Deleted: b299bc7fa1b214679723acbd1aed102bbc80eeeb
- CUDADataFormats/SiPixelDigi/interface/SiPixelDigisCUDAImpl.h:
- Added: 031e68338eeb90e4504c66e4d97784615ff65e69
- Modified: b42eeaefe5c0add3ed6fc5fd59fd650f96922914, c7256ea46632c4c4bfa9ca8e82914e5ada53df29
- Deleted: 22d6c5b3931f7dfb3603528502c5a9b89b500640
- CUDADataFormats/TrackingRecHit/src/TrackingRecHit2DHeterogeneous.cc:
- Modified: f1e6ec9518744a417afe8ef6ebc584af3e91cd07, d752dc8fe9b84a27e57fc562eb8c9ff07cbf14cf, 8ddc45e2a327f97d254788392e6baee1f3f8f434, f13550c10493ec04b8486b5c17fea0dff85de9d8, b42eeaefe5c0add3ed6fc5fd59fd650f96922914, 0920241ed10a9d14c410fdc81a73eff10d881cbf
- Deleted: eea72dc3319c5295ef2f54c6a2e71c28a07887be
- Added: d7aa3b39d17b315adce463edcf74f4cd10b17fbf
- HeterogeneousCore/CUDAUtilities/src/cudaMemoryPool.cu:
-
There are other open Pull requests which might conflict with changes you have proposed:
- File HeterogeneousCore/CUDAServices/src/CUDAService.cc modified in PR(s): #37831
- File HeterogeneousCore/CUDAUtilities/test/BuildFile.xml modified in PR(s): #35713
- File RecoLocalTracker/SiPixelRecHits/plugins/PixelRecHitGPUKernel.cu modified in PR(s): #35713
- File RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromLegacy.cc modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cu modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.h modified in PR(s): #35713
- File RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernelsAlloc.cc modified in PR(s): #35713
Pull request #37952 was updated. @malbouis, @yuanchao, @makortel, @slava77, @clacaputo, @fwyzard, @jpata, @tvami, @francescobrivio can you please check and sign again.
-1
Failed Tests: RelVals-GPU
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-651b42/24895/summary.html
COMMIT: 2d0392890d2e9282c03eca5dc21741e1dc3ff091
CMSSW: CMSSW_12_5_X_2022-05-22-0000/el8_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37952/24895/install.sh
to create a dev area with all the needed externals and cmssw changes.
RelVals-GPU
-
11634.512
11634.512_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano/step2_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano.log
-
11634.522
11634.522_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano/step2_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano.log
-
11634.506
11634.506_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano/step2_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano.log
Comparison Summary
Summary:
- No significant changes to the logs found
- Reco comparison results: 6 differences found in the comparisons
- DQMHistoTests: Total files compared: 50
- DQMHistoTests: Total histograms compared: 3650985
- DQMHistoTests: Total failures: 14
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3650949
- DQMHistoTests: Total skipped: 22
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
- Checked 208 log files, 45 edm output root files, 50 DQM output files
- TriggerResults: no differences found
I am not sure the changes introduced in this PR are the cause of the errors ad crash in the RelVals ( I mean: how I managed to mess up ECALOnlyGPU?) is "el8" now the standard platform for relvals?
I am unable to run gpu relvals
[1] Exit 1 runTheMatrix.py --gpu=required -e -t 8 -l 11634.506 >& gpu.log
[innocent@patatrack02 matrix]$
[innocent@patatrack02 matrix]$ cat gpu.log
processing relval_standard
processing relval_highstats
processing relval_pileup
processing relval_generator
processing relval_extendedgen
processing relval_production
processing relval_ged
ignoring relval_upgrade from default matrix
ignoring relval_cleanedupgrade from default matrix
ignoring relval_gpu from default matrix
processing relval_2017
processing relval_2026
ignoring relval_identity from default matrix
processing relval_machine
processing relval_premix
Traceback (most recent call last):
File "/cvmfs/cms-ib.cern.ch/nweek-02733/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_5_X_2022-05-18-1100/bin/slc7_amd64_gcc11/runTheMatrix.py", line 606, in <module>
ret = runSelected(opt)
File "/cvmfs/cms-ib.cern.ch/nweek-02733/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_5_X_2022-05-18-1100/bin/slc7_amd64_gcc11/runTheMatrix.py", line 31, in runSelected
if len(undefSet)>0: raise ValueError('Undefined workflows: '+', '.join(map(str,list(undefSet))))
ValueError: Undefined workflows: 11634.506
[innocent@patatrack02 matrix]$ runTheMatrix.py --requires-gpu -e -n | grep GPU
39434.502 2026D88_Patatrack_PixelOnlyGPU+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal [1]: cmsDriver.py TTbar_14TeV_TuneCP5_cfi -s GEN,SIM -n 10 --conditions auto:phase2_realistic_T21 --beamspot HLLHC14TeV --datatier GEN-SIM --eventcontent FEVTDEBUG --geometry Extended2026D88 --era Phase2C17I13M9 --relval 9000,100
I think you need runTheMatrix.py -w gpu ...
or runTheMatrix.py -w upgrade ...
to enable the GPU workflows.
to me
11634.506_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED - time date Mon May 23 14:41:24 2022-date Mon May 23 14:34:02 2022; exit: 0 0 0 0
1 1 1 1 tests passed, 0 0 0 0 failed
with this PR and with IB CMSSW_12_5_X_2022-05-23-1100 as well
to me
11634.506_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED - time date Mon May 23 14:41:24 2022-date Mon May 23 14:34:02 2022; exit: 0 0 0 0 1 1 1 1 tests passed, 0 0 0 0 failed
with this PR and with IB CMSSW_12_5_X_2022-05-23-1100 as well
It also works for me. Let's trigger again the test
@cmsbuild please test
enable gpu
-1
Failed Tests: RelVals-GPU
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-651b42/24984/summary.html
COMMIT: 2d0392890d2e9282c03eca5dc21741e1dc3ff091
CMSSW: CMSSW_12_5_X_2022-05-24-2300/el8_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37952/24984/install.sh
to create a dev area with all the needed externals and cmssw changes.
RelVals-GPU
-
11634.506
11634.506_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano/step2_TTbar_14TeV+2021_Patatrack_PixelOnlyTripletsGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano.log
-
11634.522
11634.522_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano/step2_TTbar_14TeV+2021_Patatrack_HCALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano.log
-
11634.512
11634.512_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano/step2_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano.log
Comparison Summary
Summary:
- No significant changes to the logs found
- Reco comparison results: 0 differences found in the comparisons
- DQMHistoTests: Total files compared: 50
- DQMHistoTests: Total histograms compared: 3650985
- DQMHistoTests: Total failures: 2
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3650961
- DQMHistoTests: Total skipped: 22
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
- Checked 208 log files, 45 edm output root files, 50 DQM output files
- TriggerResults: no differences found
it's crashing in the standard (non GPU, say legacy) part
#5 0x00002b345764b344 in RecHitsSortedInPhi::RecHitsSortedInPhi(std::vector<BaseTrackerRecHit const*, std::allocator<BaseTrackerRecHit const*> > const&, Point3DBase<float, GlobalTag> const&, DetLayer const*) () from /cvmfs/cms-ib.cern.ch/nweek-02734/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-05-23-2300/lib/el8_amd64_gcc10/libRecoTrackerTkHitPairs.so
#6 0x00002b345764746c in LayerHitMapCache::operator()(SeedingLayerSetsHits::SeedingLayer const&, TrackingRegion const&) () from /cvmfs/cms-ib.cern.ch/nweek-02734/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-05-23-2300/lib/el8_amd64_gcc10/libRecoTrackerTkHitPairs.so
#7 0x00002b345764540f in HitPairGeneratorFromLayerPair::doublets(TrackingRegion const&, edm::Event const&, edm::EventSetup const&, SeedingLayerSetsHits::SeedingLayer const&, SeedingLayerSetsHits::SeedingLayer const&, LayerHitMapCache&) () from /cvmfs/cms-ib.cern.ch/nweek-02734/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-05-23-2300/lib/el8_amd64_gcc10/libRecoTrackerTkHitPairs.so
BTW it is crashing in step2: HLT and that part is not customized to run just pixelTracks, ECAL, HCAL. It seems to run just the HL menu: not even sure with gpu or not
Managed to reproduce the crash (it seems it happens if single threaded...). It seems that it runs the gpu HLT menu (it is intended?) will try to understand why....
Yes, all GPU-related workflows run the full HLT menu on GPUs (if one is available).
why it is scheduling and running both
TimeModule> 6 1 hltSiPixelRecHitsFromLegacy SiPixelRecHitSoAFromLegacy 0.000859022
and
TimeModule> 6 1 hltSiPixelRecHitsFromGPU SiPixelRecHitFromCUDA 0.0004251
?
No clue how ti was passing the previous test...
@cmsbuild , please test