cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

[RECO-UPGRADE] [GCC12] Disable cuda builds if cuda does not support gcc version

Open smuzaffar opened this issue 2 years ago • 22 comments

Disabled building cuda tests/binaries if cuda does not support gcc version e.g. currently cuda with gcc12 does not work.

smuzaffar avatar Jun 16 '22 16:06 smuzaffar

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38396/30596

  • This PR adds an extra 12KB to repository

cmsbuild avatar Jun 16 '22 16:06 cmsbuild

A new Pull Request was created by @smuzaffar (Malik Shahzad Muzaffar) for master.

It involves the following packages:

  • RecoLocalCalo/HGCalRecProducers (upgrade, reconstruction)

@clacaputo, @cmsbuild, @AdrianoDee, @srimanob, @slava77, @jpata can you please review it and eventually sign? Thanks. @edjtscott, @vandreev11, @sethzenz, @bsunanda, @felicepantaleo, @rovere, @lgray, @cseez, @apsallid, @pfs, @lecriste, @hatakeyamak, @trtomei, @ebrondol, @beaucero this is something you requested to watch as well. @perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

cmsbuild avatar Jun 16 '22 16:06 cmsbuild

please test for el8_amd64_gcc12

smuzaffar avatar Jun 17 '22 06:06 smuzaffar

please test

smuzaffar avatar Jun 17 '22 06:06 smuzaffar

-1

Failed Tests: Build Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f03205/25580/summary.html COMMIT: fa0759e226a44e8490bcba3f08270b19d9c6539a CMSSW: CMSSW_12_5_X_2022-06-15-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/38396/25580/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

/cvmfs/cms-ib.cern.ch/nweek-02737/el8_amd64_gcc12/external/gcc/12.1.1-bf4aef5069fdf6bb6f77f897bcc8a6ae/bin/../lib/gcc/x86_64-redhat-linux-gnu/12.1.1/../../../../x86_64-redhat-linux-gnu/bin/ld: tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPlugins/HEFRecHitGPUtoSoA.cc.o: in function `HEFRecHitGPUtoSoA::acquire(edm::Event const&, edm::EventSetup const&, edm::WaitingTaskWithArenaHolder) [clone .cold]':
HEFRecHitGPUtoSoA.cc:(.text.unlikely+0xbe): undefined reference to `KernelManagerHGCalRecHit::~KernelManagerHGCalRecHit()'
/cvmfs/cms-ib.cern.ch/nweek-02737/el8_amd64_gcc12/external/gcc/12.1.1-bf4aef5069fdf6bb6f77f897bcc8a6ae/bin/../lib/gcc/x86_64-redhat-linux-gnu/12.1.1/../../../../x86_64-redhat-linux-gnu/bin/ld: tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPlugins/HeterogeneousHGCalHEFCellPositionsConditions.cc.o: in function `HeterogeneousHGCalHEFCellPositionsConditions::getHeterogeneousConditionsESProductAsync(CUstream_st*) const':
HeterogeneousHGCalHEFCellPositionsConditions.cc:(.text+0x7e4): undefined reference to `KernelManagerHGCalCellPositions::KernelManagerHGCalCellPositions(unsigned long const&)'
/cvmfs/cms-ib.cern.ch/nweek-02737/el8_amd64_gcc12/external/gcc/12.1.1-bf4aef5069fdf6bb6f77f897bcc8a6ae/bin/../lib/gcc/x86_64-redhat-linux-gnu/12.1.1/../../../../x86_64-redhat-linux-gnu/bin/ld: HeterogeneousHGCalHEFCellPositionsConditions.cc:(.text+0x7f0): undefined reference to `KernelManagerHGCalCellPositions::fill_positions(hgcal_conditions::HeterogeneousHEFCellPositionsConditionsESProduct const*)'
collect2: error: ld returned 1 exit status
gmake: *** [tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPlugins/libRecoLocalCaloHGCalRecProducersPlugins.so] Error 1
Leaving library rule at src/RecoLocalCalo/HGCalRecProducers/plugins
Entering library rule at RecoLocalCalo/HGCalRecProducers
>> Compiling  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_5_X_2022-06-15-2300/src/RecoLocalCalo/HGCalRecProducers/src/ComputeClusterTime.cc
>> Compiling  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_5_X_2022-06-15-2300/src/RecoLocalCalo/HGCalRecProducers/src/HGCalRecHitWorkerFactory.cc

cmsbuild avatar Jun 17 '22 06:06 cmsbuild

please test

smuzaffar avatar Jun 17 '22 06:06 smuzaffar

so looks like the cuda code is not optional. Looks like there are some *GPU*.cc e.g EERecHitGPU.cc , EERecHitGPUtoSoA.cc files which requires cuda code

EERecHitGPU.cc:(.text+0x208c): undefined reference to `KernelManagerHGCalRecHit::run_kernels(KernelConstantData<HGCeeUncalibRecHitConstantData> const*, CUstream_st* const&)'
EERecHitGPU.cc:(.text+0x20d9): undefined reference to `KernelManagerHGCalRecHit::~KernelManagerHGCalRecHit()'

Is the code in *GPU*.cc really required for non gpu runs? If not then we can skip these files too

smuzaffar avatar Jun 17 '22 06:06 smuzaffar

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f03205/25585/summary.html COMMIT: fa0759e226a44e8490bcba3f08270b19d9c6539a CMSSW: CMSSW_12_5_X_2022-06-16-2300/el8_amd64_gcc10 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/38396/25585/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3659074
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3659050
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

cmsbuild avatar Jun 17 '22 10:06 cmsbuild

Is the code in *GPU*.cc really required for non gpu runs? If not then we can skip these files too

Naively, I'd say no, they shouldn’t rely on GPU code for non-GPU runs, but maybe @cms-sw/heterogeneous-l2 can comment.

clacaputo avatar Jun 22 '22 16:06 clacaputo

@clacaputo you are probably right. The best approach would be to split the plugin in two: one working entirely on CPU, and one with the GU code. However, I would suggest to let the authors of the package to take care of this.

fwyzard avatar Jun 22 '22 19:06 fwyzard

Hi @smuzaffar EERecHitGPU.cc , EERecHitGPUtoSoA.cc seem to be used only in some test code, so they can be easily skipped. Concerning the other plugins showing the same behaviour (do you have a list?), maybe we could open an issue on ask the main developer to address the splitting suggested by @fwyzard .

clacaputo avatar Jun 27 '22 14:06 clacaputo

@cmsbuild please test for el8_amd64_gcc12

clacaputo avatar Aug 08 '22 10:08 clacaputo

Hi @smuzaffar , I've tried to refresh the test results using el8_amd64_gcc12, but the test failed with this error:

'Unable to find CMSSW release for CMSSW_12_5_X/el8_amd64_gcc12'

Am I doing something wrong?

clacaputo avatar Aug 08 '22 14:08 clacaputo

@cmsbuild please test for el8_amd64_gcc12

clacaputo avatar Sep 08 '22 14:09 clacaputo

by the way, do we have gcc12 IBs to test this locally ?

fwyzard avatar Sep 08 '22 14:09 fwyzard

@fwyzard as gcc12 IBs are broken so we only build those on demand. I just have started one which should be available for test later in the evening

smuzaffar avatar Sep 08 '22 15:09 smuzaffar

Thanks - I won't be able to have a look until next week, but I guess the IB should stay around for another 10 days or so.

fwyzard avatar Sep 08 '22 19:09 fwyzard

@smuzaffar

as gcc12 IBs are broken so we only build those on demand. I just have started one which should be available for test later in the evening

Ah... they are very broken; we don't seem to get any CMSSW built at all :-/

It is pretty low on my priority, but, is there a way to see what is failing ? Other than attempting the full build locally, of course.

fwyzard avatar Sep 14 '22 13:09 fwyzard

please test for el8_amd64_gcc12

smuzaffar avatar Sep 18 '22 09:09 smuzaffar

-1

Failed Tests: Build Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f03205/27633/summary.html COMMIT: fa0759e226a44e8490bcba3f08270b19d9c6539a CMSSW: CMSSW_12_6_X_2022-09-17-1100/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38396/27633/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

/cvmfs/cms-ib.cern.ch/nweek-02750/el8_amd64_gcc12/external/gcc/12.2.0-f8ec77b592790702d83afb7106a458e3/bin/../lib/gcc/x86_64-redhat-linux-gnu/12.2.1/../../../../x86_64-redhat-linux-gnu/bin/ld: tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPlugins/HEFRecHitGPUtoSoA.cc.o: in function `HEFRecHitGPUtoSoA::acquire(edm::Event const&, edm::EventSetup const&, edm::WaitingTaskWithArenaHolder) [clone .cold]':
HEFRecHitGPUtoSoA.cc:(.text.unlikely+0xa3): undefined reference to `KernelManagerHGCalRecHit::~KernelManagerHGCalRecHit()'
/cvmfs/cms-ib.cern.ch/nweek-02750/el8_amd64_gcc12/external/gcc/12.2.0-f8ec77b592790702d83afb7106a458e3/bin/../lib/gcc/x86_64-redhat-linux-gnu/12.2.1/../../../../x86_64-redhat-linux-gnu/bin/ld: tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPlugins/HeterogeneousHGCalHEFCellPositionsConditions.cc.o: in function `HeterogeneousHGCalHEFCellPositionsConditions::getHeterogeneousConditionsESProductAsync(CUstream_st*) const':
HeterogeneousHGCalHEFCellPositionsConditions.cc:(.text+0x7e4): undefined reference to `KernelManagerHGCalCellPositions::KernelManagerHGCalCellPositions(unsigned long const&)'
/cvmfs/cms-ib.cern.ch/nweek-02750/el8_amd64_gcc12/external/gcc/12.2.0-f8ec77b592790702d83afb7106a458e3/bin/../lib/gcc/x86_64-redhat-linux-gnu/12.2.1/../../../../x86_64-redhat-linux-gnu/bin/ld: HeterogeneousHGCalHEFCellPositionsConditions.cc:(.text+0x7f0): undefined reference to `KernelManagerHGCalCellPositions::fill_positions(hgcal_conditions::HeterogeneousHEFCellPositionsConditionsESProduct const*)'
collect2: error: ld returned 1 exit status
gmake: *** [tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPlugins/libRecoLocalCaloHGCalRecProducersPlugins.so] Error 1
Leaving library rule at src/RecoLocalCalo/HGCalRecProducers/plugins
>> Compiling  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_6_X_2022-09-17-1100/src/RecoLocalCalo/HGCalRecProducers/test/EtaPhiSearchInTile_t.cpp
>> Building binary EtaPhiSearchInTileLC
Copying tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/test/EtaPhiSearchInTileLC/EtaPhiSearchInTileLC to productstore area:

cmsbuild avatar Sep 18 '22 14:09 cmsbuild

please test for el8_amd64_gcc12

clacaputo avatar Oct 13 '22 13:10 clacaputo

-1

Failed Tests: Build Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f03205/28231/summary.html COMMIT: fa0759e226a44e8490bcba3f08270b19d9c6539a CMSSW: CMSSW_12_6_X_2022-10-12-1100/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38396/28231/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc12/external/gcc/12.2.0-f8ec77b592790702d83afb7106a458e3/bin/../lib/gcc/x86_64-redhat-linux-gnu/12.2.1/../../../../x86_64-redhat-linux-gnu/bin/ld: tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPlugins/HEFRecHitGPUtoSoA.cc.o: in function `HEFRecHitGPUtoSoA::acquire(edm::Event const&, edm::EventSetup const&, edm::WaitingTaskWithArenaHolder) [clone .cold]':
HEFRecHitGPUtoSoA.cc:(.text.unlikely+0xa3): undefined reference to `KernelManagerHGCalRecHit::~KernelManagerHGCalRecHit()'
/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc12/external/gcc/12.2.0-f8ec77b592790702d83afb7106a458e3/bin/../lib/gcc/x86_64-redhat-linux-gnu/12.2.1/../../../../x86_64-redhat-linux-gnu/bin/ld: tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPlugins/HeterogeneousHGCalHEFCellPositionsConditions.cc.o: in function `HeterogeneousHGCalHEFCellPositionsConditions::getHeterogeneousConditionsESProductAsync(CUstream_st*) const':
HeterogeneousHGCalHEFCellPositionsConditions.cc:(.text+0x7e4): undefined reference to `KernelManagerHGCalCellPositions::KernelManagerHGCalCellPositions(unsigned long const&)'
/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc12/external/gcc/12.2.0-f8ec77b592790702d83afb7106a458e3/bin/../lib/gcc/x86_64-redhat-linux-gnu/12.2.1/../../../../x86_64-redhat-linux-gnu/bin/ld: HeterogeneousHGCalHEFCellPositionsConditions.cc:(.text+0x7f0): undefined reference to `KernelManagerHGCalCellPositions::fill_positions(hgcal_conditions::HeterogeneousHEFCellPositionsConditionsESProduct const*)'
collect2: error: ld returned 1 exit status
gmake: *** [tmp/el8_amd64_gcc12/src/RecoLocalCalo/HGCalRecProducers/plugins/RecoLocalCaloHGCalRecProducersPlugins/libRecoLocalCaloHGCalRecProducersPlugins.so] Error 1
Leaving library rule at src/RecoLocalCalo/HGCalRecProducers/plugins
Entering library rule at RecoLocalCalo/HGCalRecProducers
>> Compiling  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_6_X_2022-10-12-1100/src/RecoLocalCalo/HGCalRecProducers/src/ComputeClusterTime.cc
>> Compiling  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_6_X_2022-10-12-1100/src/RecoLocalCalo/HGCalRecProducers/src/HGCalRecHitWorkerFactory.cc

cmsbuild avatar Oct 13 '22 14:10 cmsbuild

@smuzaffar , do we still need this PR in?

clacaputo avatar Dec 05 '22 14:12 clacaputo

No, not really, closing it

smuzaffar avatar Dec 05 '22 14:12 smuzaffar