cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Extend onnxruntime gpu interface to producers using onnxruntime

Open davidlange6 opened this issue 2 years ago • 27 comments

Extends #36963 by adding a backend parameter to models, used by

cms::Ort::getSessionOptions(iConfig.getParameterstd::string("onnx_backend"));

Current options are cpu -> Use CPU backend cuda -> Use cuda backend default -> Use best available

The model used in BoostedJetONNXJetTagsProducer crashes on GPU if the full optimization is included. I reduced this optimization in case a GPU is used (following recipes found on the web). The sort of error one gets is

Base::CudnnHandle(), &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data); 
2022-08-26 13:26:51.964709271 [E:onnxruntime:, sequential_executor.cc:346 Execute] Non-zero status code returned while running FusedConv node. Name:'Conv_98_Add_99_Relu_100'
 Status Message: CUDNN error executing cudnnAddTensor(Base::CudnnHandle(), &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data)
----- Begin Fatal Exception 26-Aug-2022 14:26:51 CEST-----------------------
An exception of category 'StdException' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0

So far I do not see any significant performance improvement (at least on lxplus-gpu) nor loss. At least BoostedJetONNXJetTagsProducer.cc can be improved to send more than one jet to onnxruntime at a time.

davidlange6 avatar Sep 15 '22 13:09 davidlange6

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32108

  • This PR adds an extra 28KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

  • code-format: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32108/code-format.patch e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32108/code-format.patch | patch -p1 You can also run scram build code-format to apply code format directly

cmsbuild avatar Sep 15 '22 13:09 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32110

  • This PR adds an extra 28KB to repository

cmsbuild avatar Sep 15 '22 13:09 cmsbuild

A new Pull Request was created by @davidlange6 (David Lange) for master.

It involves the following packages:

  • PhysicsTools/ONNXRuntime (reconstruction)
  • RecoBTag/ONNXRuntime (reconstruction)
  • RecoParticleFlow/PFProducer (reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks. @AlexDeMoor, @mmarionncern, @JyothsnaKomaragiri, @AnnikaStein, @riga, @emilbols, @lgray, @missirol, @hatakeyamak, @andrzejnovak, @demuller, @seemasharmafnal this is something you requested to watch as well. @perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

cmsbuild avatar Sep 15 '22 13:09 cmsbuild

enable gpu

davidlange6 avatar Sep 15 '22 14:09 davidlange6

please test

davidlange6 avatar Sep 15 '22 14:09 davidlange6

assign heterogenous

mandrenguyen avatar Sep 15 '22 14:09 mandrenguyen

assign heterogeneous (helps if you can spell)

mandrenguyen avatar Sep 15 '22 14:09 mandrenguyen

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Sep 15 '22 14:09 cmsbuild

We also encountered this ONNX issue in SONIC tests. I think it's https://github.com/microsoft/onnxruntime/issues/12321. There's a fix merged, but not in a release yet.

kpedro88 avatar Sep 15 '22 14:09 kpedro88

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-edc989/27572/summary.html COMMIT: 9433eda06107f56bdc6f280b9ec0803350b2b7d5 CMSSW: CMSSW_12_6_X_2022-09-15-1100/el8_amd64_gcc10 Additional Tests: GPU User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39402/27572/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • Reco comparison had 3 failed jobs
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19876
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19868
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 1 / 3 workflows

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3618326
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3618296
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 212 log files, 49 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

cmsbuild avatar Sep 15 '22 18:09 cmsbuild

Given that by default a GPU would be used if it is available, maybe it would be time to make the loading of Configuration.StandardSequecens.Accelerators_cff unconditional https://github.com/cms-sw/cmssw/blob/284681e89ac822d328cf54cfe57866c475fae9e4/Configuration/StandardSequences/python/Services_cff.py#L11-L18 either by unconditional process.load("Configuration.StandardSequecens.Accelerators_cff") there, or making the ConfigBuilder to load the Accelerators_cff in similar way as Services_cff (in which case the Accelerators_cff would be visible in the generated configuration)?

makortel avatar Sep 16 '22 14:09 makortel

Try out asynchronous offload (would need e.g. https://github.com/cms-sw/cmssw/issues/29188)

regardless of the setting, onnxruntime can decide to use the CPU for some model components (or rather, it does in our models and I see no way to disable this). I believe this offload would need to be handled by a change to onnxruntime itself (its possible I have misunderstood how this works or also that such a hook already exists)

davidlange6 avatar Sep 19 '22 13:09 davidlange6

Try out asynchronous offload (would need e.g. https://github.com/cms-sw/cmssw/issues/29188)

regardless of the setting, onnxruntime can decide to use the CPU for some model components (or rather, it does in our models and I see no way to disable this). I believe this offload would need to be handled by a change to onnxruntime itself (its possible I have misunderstood how this works or also that such a hook already exists)

Thanks, if there is a risk of any significant CPU use, we'd not want to do that in a non-TBB thread.

makortel avatar Sep 19 '22 14:09 makortel

I think we would have to see that empirically for models we have. Not sure how to actually do that. There is some json produced by a profiler but I didn't yet manage to relate that to something like fraction of the time the CPU is doing work vs the GPU doing work. (and onnxruntime runs much more slowly in this mode)

davidlange6 avatar Sep 19 '22 17:09 davidlange6

of course that depends on what is not "significant"...

davidlange6 avatar Sep 19 '22 17:09 davidlange6

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32196

  • This PR adds an extra 32KB to repository

cmsbuild avatar Sep 21 '22 14:09 cmsbuild

Pull request #39402 was updated. @cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please check and sign again.

cmsbuild avatar Sep 21 '22 14:09 cmsbuild

please test

mandrenguyen avatar Sep 26 '22 13:09 mandrenguyen

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-edc989/27770/summary.html COMMIT: 68fc2b3ef8865796c09898f6a380bebcece52e30 CMSSW: CMSSW_12_6_X_2022-09-26-1100/el8_amd64_gcc10 Additional Tests: GPU User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39402/27770/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • Reco comparison had 3 failed jobs
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19876
  • DQMHistoTests: Total failures: 529
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19347
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 1 / 3 workflows

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3624368
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3624344
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 212 log files, 49 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

cmsbuild avatar Sep 26 '22 17:09 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39402/32301

  • This PR adds an extra 32KB to repository

cmsbuild avatar Sep 29 '22 07:09 cmsbuild

Pull request #39402 was updated. @cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please check and sign again.

cmsbuild avatar Sep 29 '22 07:09 cmsbuild

please testOn Sep 29, 2022 09:28, cmsbuild @.***> wrote: Pull request #39402 was updated. @cmsbuild, @makortel, @mandrenguyen, @clacaputo, @fwyzard can you please check and sign again.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

davidlange6 avatar Sep 29 '22 07:09 davidlange6

please test

perrotta avatar Sep 29 '22 08:09 perrotta

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-edc989/27836/summary.html COMMIT: 54d730fcebe7add42934384806eb4be3a323af3e CMSSW: CMSSW_12_6_X_2022-09-28-2300/el8_amd64_gcc10 Additional Tests: GPU User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/39402/27836/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-edc989/41834.0_TTbar_14TeV+2026D94+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3433154
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3433129
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 204 log files, 49 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • Reco comparison had 3 failed jobs
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19876
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19868
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 1 / 3 workflows

cmsbuild avatar Sep 29 '22 12:09 cmsbuild

There are some changes to the tests here, but I think @cms-sw/heterogeneous-l2 is probably better qualified than @cms-sw/reconstruction-l2 to comment

mandrenguyen avatar Sep 30 '22 14:09 mandrenguyen

The non-GPU tests show differences only regarding MessageLogger. The differences in 11634.506 in GPU tests look like the usual variation seen in GPU tests.

Note that this PR has currently no impact on workflows that do not enable gpu (or pixelNtupletFit) modifier.

makortel avatar Sep 30 '22 14:09 makortel

+reconstruction No changes to CPU-only workflows. Changes to GPU worksflows are said to be expected.

mandrenguyen avatar Oct 01 '22 06:10 mandrenguyen

Milestone for this pull request has been moved to CMSSW_14_0_X.Please open a backport if it should also go in to CMSSW_13_3_X.

smuzaffar avatar Nov 06 '23 16:11 smuzaffar

Milestone for this pull request has been moved to CMSSW_14_1_X. Please open a backport if it should also go in to CMSSW_14_0_X.

cmsbuild avatar Feb 06 '24 10:02 cmsbuild

Pull request #39402 was updated. @wpmccormack, @fwyzard, @valsdav, @makortel can you please check and sign again.

cmsbuild avatar Feb 06 '24 10:02 cmsbuild