cmssw Extend onnxruntime gpu interface to producers using onnxruntime

Extends #36963 by adding a backend parameter to models, used by

cms::Ort::getSessionOptions(iConfig.getParameterstd::string("onnx_backend"));

Current options are cpu -> Use CPU backend cuda -> Use cuda backend default -> Use best available

The model used in BoostedJetONNXJetTagsProducer crashes on GPU if the full optimization is included. I reduced this optimization in case a GPU is used (following recipes found on the web). The sort of error one gets is

Base::CudnnHandle(), &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data); 
2022-08-26 13:26:51.964709271 [E:onnxruntime:, sequential_executor.cc:346 Execute] Non-zero status code returned while running FusedConv node. Name:'Conv_98_Add_99_Relu_100'
 Status Message: CUDNN error executing cudnnAddTensor(Base::CudnnHandle(), &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data)
----- Begin Fatal Exception 26-Aug-2022 14:26:51 CEST-----------------------
An exception of category 'StdException' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0

So far I do not see any significant performance improvement (at least on lxplus-gpu) nor loss. At least BoostedJetONNXJetTagsProducer.cc can be improved to send more than one jet to onnxruntime at a time.

Sep 15 '22 13:09 davidlange6

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32108

This PR adds an extra 28KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32108/code-format.patch e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32108/code-format.patch | patch -p1 You can also run scram build code-format to apply code format directly

Sep 15 '22 13:09 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32110

This PR adds an extra 28KB to repository

Sep 15 '22 13:09 cmsbuild

A new Pull Request was created by @davidlange6 (David Lange) for master.

It involves the following packages:

PhysicsTools/ONNXRuntime (reconstruction)
RecoBTag/ONNXRuntime (reconstruction)
RecoParticleFlow/PFProducer (reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks. @AlexDeMoor, @mmarionncern, @JyothsnaKomaragiri, @AnnikaStein, @riga, @emilbols, @lgray, @missirol, @hatakeyamak, @andrzejnovak, @demuller, @seemasharmafnal this is something you requested to watch as well. @perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

Sep 15 '22 13:09 cmsbuild

enable gpu

Sep 15 '22 14:09 davidlange6

please test

Sep 15 '22 14:09 davidlange6

assign heterogenous

Sep 15 '22 14:09 mandrenguyen

assign heterogeneous (helps if you can spell)

Sep 15 '22 14:09 mandrenguyen

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

Sep 15 '22 14:09 cmsbuild

We also encountered this ONNX issue in SONIC tests. I think it's https://github.com/microsoft/onnxruntime/issues/12321. There's a fix merged, but not in a release yet.

Sep 15 '22 14:09 kpedro88

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-edc989/27572/summary.html COMMIT: 9433eda06107f56bdc6f280b9ec0803350b2b7d5 CMSSW: CMSSW_12_6_X_2022-09-15-1100/el8_amd64_gcc10 Additional Tests: GPU User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39402/27572/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
Reco comparison had 3 failed jobs
DQMHistoTests: Total files compared: 4
DQMHistoTests: Total histograms compared: 19876
DQMHistoTests: Total failures: 8
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 19868
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
Checked 12 log files, 9 edm output root files, 4 DQM output files
TriggerResults: found differences in 1 / 3 workflows

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 5 differences found in the comparisons
DQMHistoTests: Total files compared: 51
DQMHistoTests: Total histograms compared: 3618326
DQMHistoTests: Total failures: 8
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3618296
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
Checked 212 log files, 49 edm output root files, 51 DQM output files
TriggerResults: no differences found

Sep 15 '22 18:09 cmsbuild

Given that by default a GPU would be used if it is available, maybe it would be time to make the loading of Configuration.StandardSequecens.Accelerators_cff unconditional https://github.com/cms-sw/cmssw/blob/284681e89ac822d328cf54cfe57866c475fae9e4/Configuration/StandardSequences/python/Services_cff.py#L11-L18 either by unconditional process.load("Configuration.StandardSequecens.Accelerators_cff") there, or making the ConfigBuilder to load the Accelerators_cff in similar way as Services_cff (in which case the Accelerators_cff would be visible in the generated configuration)?

Sep 16 '22 14:09 makortel

Try out asynchronous offload (would need e.g. https://github.com/cms-sw/cmssw/issues/29188)

regardless of the setting, onnxruntime can decide to use the CPU for some model components (or rather, it does in our models and I see no way to disable this). I believe this offload would need to be handled by a change to onnxruntime itself (its possible I have misunderstood how this works or also that such a hook already exists)

Sep 19 '22 13:09 davidlange6

Try out asynchronous offload (would need e.g. https://github.com/cms-sw/cmssw/issues/29188)

regardless of the setting, onnxruntime can decide to use the CPU for some model components (or rather, it does in our models and I see no way to disable this). I believe this offload would need to be handled by a change to onnxruntime itself (its possible I have misunderstood how this works or also that such a hook already exists)

Thanks, if there is a risk of any significant CPU use, we'd not want to do that in a non-TBB thread.

Sep 19 '22 14:09 makortel

I think we would have to see that empirically for models we have. Not sure how to actually do that. There is some json produced by a profiler but I didn't yet manage to relate that to something like fraction of the time the CPU is doing work vs the GPU doing work. (and onnxruntime runs much more slowly in this mode)

Sep 19 '22 17:09 davidlange6

of course that depends on what is not "significant"...

Sep 19 '22 17:09 davidlange6

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32196

This PR adds an extra 32KB to repository

Sep 21 '22 14:09 cmsbuild

Pull request #39402 was updated. @cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please check and sign again.

Sep 21 '22 14:09 cmsbuild

please test

Sep 26 '22 13:09 mandrenguyen

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-edc989/27770/summary.html COMMIT: 68fc2b3ef8865796c09898f6a380bebcece52e30 CMSSW: CMSSW_12_6_X_2022-09-26-1100/el8_amd64_gcc10 Additional Tests: GPU User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39402/27770/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
Reco comparison had 3 failed jobs
DQMHistoTests: Total files compared: 4
DQMHistoTests: Total histograms compared: 19876
DQMHistoTests: Total failures: 529
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 19347
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
Checked 12 log files, 9 edm output root files, 4 DQM output files
TriggerResults: found differences in 1 / 3 workflows

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 51
DQMHistoTests: Total histograms compared: 3624368
DQMHistoTests: Total failures: 2
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3624344
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
Checked 212 log files, 49 edm output root files, 51 DQM output files
TriggerResults: no differences found

Sep 26 '22 17:09 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32301

This PR adds an extra 32KB to repository

Sep 29 '22 07:09 cmsbuild

Pull request #39402 was updated. @cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please check and sign again.

Sep 29 '22 07:09 cmsbuild

please testOn Sep 29, 2022 09:28, cmsbuild @.***> wrote: Pull request #39402 was updated. @cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please check and sign again.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

Sep 29 '22 07:09 davidlange6

please test

Sep 29 '22 08:09 perrotta

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-edc989/27836/summary.html COMMIT: 54d730fcebe7add42934384806eb4be3a323af3e CMSSW: CMSSW_12_6_X_2022-09-28-2300/el8_amd64_gcc10 Additional Tests: GPU User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39402/27836/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-edc989/41834.0_TTbar_14TeV+2026D94+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal

Summary:

No significant changes to the logs found
Reco comparison results: 4 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3433154
DQMHistoTests: Total failures: 3
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3433129
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
Checked 204 log files, 49 edm output root files, 49 DQM output files
TriggerResults: no differences found

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
Reco comparison had 3 failed jobs
DQMHistoTests: Total files compared: 4
DQMHistoTests: Total histograms compared: 19876
DQMHistoTests: Total failures: 8
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 19868
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
Checked 12 log files, 9 edm output root files, 4 DQM output files
TriggerResults: found differences in 1 / 3 workflows

Sep 29 '22 12:09 cmsbuild

There are some changes to the tests here, but I think @cms-sw/heterogeneous-l2 is probably better qualified than @cms-sw/reconstruction-l2 to comment

Sep 30 '22 14:09 mandrenguyen

The non-GPU tests show differences only regarding MessageLogger. The differences in 11634.506 in GPU tests look like the usual variation seen in GPU tests.

Note that this PR has currently no impact on workflows that do not enable gpu (or pixelNtupletFit) modifier.

Sep 30 '22 14:09 makortel

+reconstruction No changes to CPU-only workflows. Changes to GPU worksflows are said to be expected.

Oct 01 '22 06:10 mandrenguyen

Milestone for this pull request has been moved to CMSSW_14_0_X.Please open a backport if it should also go in to CMSSW_13_3_X.

Nov 06 '23 16:11 smuzaffar

Milestone for this pull request has been moved to CMSSW_14_1_X. Please open a backport if it should also go in to CMSSW_14_0_X.

Feb 06 '24 10:02 cmsbuild

Pull request #39402 was updated. @wpmccormack, @fwyzard, @valsdav, @makortel can you please check and sign again.

Feb 06 '24 10:02 cmsbuild

cmssw cmssw copied to clipboard

Extend onnxruntime gpu interface to producers using onnxruntime

GPU Comparison Summary

Comparison Summary

GPU Comparison Summary

Comparison Summary

Comparison Summary

GPU Comparison Summary

cmssw
cmssw copied to clipboard