cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

[14_0_X] Revert "Workaround to produce exactly same data products in Serial and CUDA backends in Alpaka modules possibly used at HLT"

Open makortel opened this issue 1 year ago • 16 comments

PR description:

Reverts cms-sw/cmssw#44699, to be used in conjunction with https://github.com/cms-sw/cmssw/pull/44978

PR validation:

None

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Backport of https://github.com/cms-sw/cmssw/pull/45080

makortel avatar May 28 '24 15:05 makortel

A new Pull Request was created by @makortel for CMSSW_14_0_X.

It involves the following packages:

  • EventFilter/EcalRawToDigi (reconstruction)
  • HeterogeneousCore/AlpakaCore (heterogeneous)
  • RecoLocalCalo/EcalRecProducers (reconstruction)
  • RecoLocalTracker/SiPixelClusterizer (reconstruction)
  • RecoLocalTracker/SiPixelRecHits (reconstruction)
  • RecoParticleFlow/PFClusterProducer (reconstruction)
  • RecoParticleFlow/PFRecHitProducer (reconstruction)
  • RecoTracker/PixelSeeding (reconstruction)
  • RecoTracker/PixelVertexFinding (reconstruction)
  • RecoVertex/BeamSpotProducer (reconstruction, alca)

@fwyzard, @saumyaphor4252, @makortel, @jfernan2, @perrotta, @cmsbuild, @mandrenguyen, @consuegs can you please review it and eventually sign? Thanks. @rovere, @mmusich, @missirol, @mtosi, @JanFSchulte, @tsusa, @gpetruc, @dkotlins, @GiacomoSguazzoni, @seemasharmafnal, @ferencek, @mroguljic, @threus, @tvami, @lgray, @VinInn, @youyingli, @thomreis, @hatakeyamak, @yuanchao, @felicepantaleo, @mmarionncern, @Martin-Grunewald, @apsallid, @francescobrivio, @rsreds, @argiro, @ReyerBand, @sameasy, @wang0jin, @fabiocos, @rchatter, @VourMa, @dgulhan, @tocheng this is something you requested to watch as well. @antoniovilela, @sextonkennedy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

cmsbuild avatar May 28 '24 15:05 cmsbuild

cms-bot internal usage

cmsbuild avatar May 28 '24 15:05 cmsbuild

enable gpu

makortel avatar May 28 '24 15:05 makortel

@cmsbuild, please test

makortel avatar May 28 '24 15:05 makortel

@smuzaffar @makortel @cms-sw/orp-l2 @cms-sw/ppd-l2 Confirming: Plan is to merge in 14_1_X https://github.com/cms-sw/cmssw/pull/45080 together with https://github.com/cms-sw/cmssw/pull/44892 Then, this PR https://github.com/cms-sw/cmssw/pull/45081 plus https://github.com/cms-sw/cmssw/pull/44978 will go in the special 14_0_X branch, starting from 14_0_7_patch1.

antoniovilela avatar May 28 '24 18:05 antoniovilela

Then, this PR https://github.com/cms-sw/cmssw/pull/45081 plus https://github.com/cms-sw/cmssw/pull/44978 will go in the special 14_0_X branch, starting from 14_0_7_patch1.

@antoniovilela , just to be sure, #45081 and #44978 will also go in CMSSW_14_0_X ... right? and once these PR are merged then we want to have CMSSW_14_0_7_HLT release (or please suggest a better name) which should have 14_0_7_patch1 + #45081 and #44978 ... right?

smuzaffar avatar May 28 '24 19:05 smuzaffar

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-df7e0f/39584/summary.html COMMIT: 56232577d6f931bb0c2d6c8da3bdccedec56ebe9 CMSSW: CMSSW_14_0_X_2024-05-28-1100/el8_amd64_gcc12 Additional Tests: GPU User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45081/39584/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

  • @CMSTrackerDPG cms-sw/cmssw#45014
  • @aniketkhanal cms-sw/cmssw#45003

You can see more details here: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-df7e0f/39584/git-recent-commits.json https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-df7e0f/39584/git-merge-result

Comparison Summary

Summary:

  • You potentially added 108 lines to the logs
  • Reco comparison results: 81 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3440217
  • DQMHistoTests: Total failures: 2798
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3437399
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 6931.257 KiB( 48 files compared)
  • DQMHistoSizes: changed ( 12834.0,... ): 1035.999 KiB HLT/BTV
  • DQMHistoSizes: changed ( 141.042,... ): 929.087 KiB HLT/BTV
  • Checked 206 log files, 170 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 35 differences found in the comparisons
  • DQMHistoTests: Total files compared: 3
  • DQMHistoTests: Total histograms compared: 39740
  • DQMHistoTests: Total failures: 1072
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 38668
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 2 files compared)
  • Checked 8 log files, 10 edm output root files, 3 DQM output files
  • TriggerResults: no differences found

cmsbuild avatar May 28 '24 19:05 cmsbuild

just to be sure, https://github.com/cms-sw/cmssw/pull/45081 and https://github.com/cms-sw/cmssw/pull/44978 will also go in CMSSW_14_0_X ... right?

I thought the whole point of the special branch was to NOT have https://github.com/cms-sw/cmssw/pull/45081 and https://github.com/cms-sw/cmssw/pull/44978 in 14_0_X until tested (in the special release). What am I missing?

mmusich avatar May 28 '24 19:05 mmusich

I thought the whole point of the special branch was to NOT have #45081 and #44978 in 14_0_X until tested (in the special release). What am I missing?

I think we do not yet want to merge 44978 into 140X. It should first be tested and when we’re certain it is doing what’s intended, then we merge. this is why we need the special release with 44978 (the definite solution) in and 44968 (the workaround) out.

malbouis avatar May 28 '24 20:05 malbouis

just to be sure, #45081 and #44978 will also go in CMSSW_14_0_X ... right?

I thought the whole point of the special branch was to NOT have #45081 and #44978 in 14_0_X until tested (in the special release).

ok, understood now

smuzaffar avatar May 28 '24 20:05 smuzaffar

just to be sure, #45081 and #44978 will also go in CMSSW_14_0_X ... right?

I thought the whole point of the special branch was to NOT have #45081 and #44978 in 14_0_X until tested (in the special release).

ok, understood now

Many thanks @smuzaffar @mmusich @malbouis

antoniovilela avatar May 28 '24 23:05 antoniovilela

@makortel @Dr15Jones @antoniovilela @rappoccio , both #45080 and #44892 are merged in 14.1.X and will be part of 11h00 IB today. I have created CMSSW_14_0_HLTTest branch which has CMSSW_14_0_7_patch1 + changes from #45081 and #44978 . The changes w.r.t 14.0.7.patch1 are https://github.com/cms-sw/cmssw/compare/CMSSW_14_0_7_patch1...CMSSW_14_0_HLTTest?expand=1 . I can start the build of CMSSW_14_0_7_HLTTest release later today(around 16h00) and can upload it once we have good results from 14.1.X 11h IB. Let me know if all look good for CMSSW_14_0_7_HLTTest branch.

By the way, do we need all archs for this test release or production arch (i.e. el8_amd64_gcc12) is enough? Do we a MULTIARCHS release too ?

smuzaffar avatar May 30 '24 08:05 smuzaffar

By the way, do we need all archs for this test release or production arch (i.e. el8_amd64_gcc12) is enough? Do we a MULTIARCHS release too ?

  • I think we can stick to the production arch only
  • we do need MULTIARCHS too (for DAQ/HLT)

Thanks! Francesco

francescobrivio avatar May 30 '24 08:05 francescobrivio

The changes w.r.t 14.0.7.patch1 are https://github.com/cms-sw/cmssw/compare/CMSSW_14_0_7_patch1...CMSSW_14_0_HLTTest?expand=1 . I can start the build of CMSSW_14_0_7_HLTTest release later today(around 16h00) and can upload it once we have good results from 14.1.X 11h IB. Let me know if all look good for CMSSW_14_0_7_HLTTest branch.

Looks good to me.

makortel avatar May 30 '24 13:05 makortel

thanks @makortel . Note that #44978 is backport of #44892 which was integrated in today's 11h 14.1.X IB and one unit test failed . Is it safe to go ahead with CMSSW_14_0_7_HLTTest release ?

smuzaffar avatar May 30 '24 13:05 smuzaffar

Note that #44978 is backport of #44892 which was integrated in today's 11h 14.1.X IB and one unit test failed .

Given the nature of the test, the failure is expected (https://github.com/cms-sw/cmssw/pull/44892#issuecomment-2139517042). The test itself is brittle and will need an update.

Is it safe to go ahead with CMSSW_14_0_7_HLTTest release ?

Yes.

Thanks!

makortel avatar May 30 '24 13:05 makortel

+heterogeneous

fwyzard avatar Jun 07 '24 22:06 fwyzard

+1

jfernan2 avatar Jun 10 '24 07:06 jfernan2

+alca

  • As it was confirmed at the Join Ops meeting of June 10, tests of the event meta data changes in streamer files went fine with CMSSW_14_0_7_HLTTest, therefore this workaround can be removed from 14_0_X (this has to be merged together with #44978)

perrotta avatar Jun 10 '24 11:06 perrotta

This pull request is fully signed and it will be integrated in one of the next CMSSW_14_0_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_14_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @rappoccio, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

cmsbuild avatar Jun 10 '24 11:06 cmsbuild

+1

rappoccio avatar Jun 10 '24 15:06 rappoccio