Add ParticleFlow Client for Online DQM - GPUvsCPU comparison
This PR creates a new Online DQM client for Particle Flow, taking inspiration from the HCAL GPU client. Currently the PF client will be used for monitoring PFCluster@Alpaka, comparing the GPU with the CPU version from the DQMGPUvsCPU stream.
Test
Local test on lxplus with the following command, following instructions on the DQM Twiki
cmsRun DQM/Integration/python/clients/hcalgpu_dqm_sourceclient-live_cfg.py runInputDir=/eos/cms/store/group/comm_dqm/ runNumber=380649 runkey=pp_run scanOnce=True
Just for completeness, adding some of the plots produced by the client
Backport
Probably a backport to CMSSW_14_0_X will be needed
@missirol @swagata87 @stahlleiton @hatakeyamak @jsamudio
cms-bot internal usage
-code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45079/40395
- This PR adds an extra 20KB to repository
Code check has found code style and quality issues which could be resolved by applying following patch(s)
- code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45079/40395/code-format.patch
e.g.
curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45079/40395/code-format.patch | patch -p1You can also runscram build code-formatto apply code format directly
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45079/40397
- This PR adds an extra 20KB to repository
A new Pull Request was created by @waredjeb for master.
It involves the following packages:
- DQM/Integration (dqm)
- DQM/PFTasks (****)
The following packages do not have a category, yet:
DQM/PFTasks Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category
@syuvivida, @rvenditti, @nothingface0, @cmsbuild, @tjavaid, @antoniovagnerini can you please review it and eventually sign? Thanks. @francescobrivio, @batinkov, @threus this is something you requested to watch as well. @antoniovilela, @sextonkennedy, @rappoccio you are the release manager for this.
cms-bot commands are listed here
type pf
-code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45079/40409
- This PR adds an extra 24KB to repository
Code check has found code style and quality issues which could be resolved by applying following patch(s)
- code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45079/40409/code-format.patch
e.g.
curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45079/40409/code-format.patch | patch -p1You can also runscram build code-formatto apply code format directly
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45079/40410
- This PR adds an extra 20KB to repository
Pull request #45079 was updated. @cmsbuild, @syuvivida, @rvenditti, @tjavaid, @nothingface0, @antoniovagnerini can you please check and sign again.
@cmsbuild please test
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-47eeef/39629/summary.html
COMMIT: d0d6410c623ff4e03e0b24526a80d768dfc29df3
CMSSW: CMSSW_14_1_X_2024-05-30-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45079/39629/install.sh to create a dev area with all the needed externals and cmssw changes.
Comparison Summary
Summary:
- You potentially removed 3 lines from the logs
- Reco comparison results: 4 differences found in the comparisons
- DQMHistoTests: Total files compared: 48
- DQMHistoTests: Total histograms compared: 3338862
- DQMHistoTests: Total failures: 3
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3338839
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
- Checked 202 log files, 165 edm output root files, 48 DQM output files
- TriggerResults: no differences found
kind ping @cms-sw/dqm-l2
Hi @waredjeb,
We have tested this PR against vanilla CMSSW_14_0_7 + PRs 45007,45027 and run 381069 on DQM playback machine. All clients on playback ended gracefully so far.
For your convenience, you can check the logs at DQM^2 mirror here: https://cmsweb.cern.ch/dqm/dqm-square/?run=514356&db=playback and DQM Online Playback GUI here: https://cmsweb.cern.ch/dqm/online-playback
Best regards, Vichayanun for DQM Core team
Dear Authors, Just to note that when we tested with older DQM streamers, from run 379530, we saw errors of product not found (see error log below) and the client pfgpu crashed:
----- Begin Fatal Exception 04-Jun-2024 09:04:52 CEST----------------------- An exception of category 'ProductNotFound' occurred while [0] Processing Event run: 379866 lumi: 718 event: 826903685 stream: 0 [1] Running path 'tasksPath' [2] Calling method for module PFHcalGPUComparisonTask/'pfHcalGPUComparisonTask' Exception Message: Principal::getByToken: Found zero products matching all criteria Looking for type: std::vectorreco::PFCluster Looking for module label: hltParticleFlowClusterHCALSerialSync Looking for productInstanceName:
We suggest to add in process.options if you agree.
TryToContinue = cms.untracked.vstring( 'ProductNotFound' )
Dear Authors, Just to note that when we tested with older DQM streamers, from run 379530, we saw errors of product not found (see error log below) and the client pfgpu crashed:
----- Begin Fatal Exception 04-Jun-2024 09:04:52 CEST----------------------- An exception of category 'ProductNotFound' occurred while [0] Processing Event run: 379866 lumi: 718 event: 826903685 stream: 0 [1] Running path 'tasksPath' [2] Calling method for module PFHcalGPUComparisonTask/'pfHcalGPUComparisonTask' Exception Message: Principal::getByToken: Found zero products matching all criteria Looking for type: std::vectorreco::PFCluster Looking for module label: hltParticleFlowClusterHCALSerialSync Looking for productInstanceName:
We suggest to add in process.options if you agree.
TryToContinue = cms.untracked.vstring( 'ProductNotFound' )
Dear @syuvivida Indeed, the collection was not saved in the event back then. Thanks for checking, I can add the line you suggested!
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45079/40467
- This PR adds an extra 20KB to repository
Pull request #45079 was updated. @antoniovagnerini, @syuvivida, @rvenditti, @nothingface0, @tjavaid, @cmsbuild can you please check and sign again.
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45079/40468
- This PR adds an extra 20KB to repository
Pull request #45079 was updated. @tjavaid, @syuvivida, @rvenditti, @antoniovagnerini, @nothingface0, @cmsbuild can you please check and sign again.
Hi @waredjeb,
Thanks to the new fixes you just made, run 379530 now ended gracefully on playback machines.
Here is the status on DQM^2: https://cmsweb.cern.ch/dqm/dqm-square/?run=514378&db=playback
Best regards, Vichayanun for DQM Core team
please test
We suggest to add in process.options if you agree.
TryToContinue = cms.untracked.vstring( 'ProductNotFound' )
I am sorry to chime in, but I have a question about this. If the collections that are given in input do not correspond to the expectations, means that somehow the input stream from HLT doesn't contain the expected event content. If you let the client to silently fail, who's going to propagate the information that something is wrong with the HLT stream? For instance recently we noticed that the pixel GPU client was not using the right collections and until we submitted this PR https://github.com/cms-sw/cmssw/pull/44933 we didn't have monitoring. Apparently no-one was checking the histograms at P5 (not offline). How to make sure things like this get noticed?
Hi @mmusich If we want the online shifters to check the pixel GPU clients results, the Tracker group needs to update/implement the instruction twiki page, and also include the plots in the shift page of DQMGUI.
@syuvivida
If we want the online shifters to check the pixel GPU clients results, the Tracker group needs to update/implement the instruction twiki page, and also include the plots in the shift page of DQMGUI.
thanks but this doesn't answer the general question. E.g. for this client, does the PF group plan in providing such instructions etc.?
@syuvivida
If we want the online shifters to check the pixel GPU clients results, the Tracker group needs to update/implement the instruction twiki page, and also include the plots in the shift page of DQMGUI.
thanks but this doesn't answer the general question. E.g. for this client, does the PF group plan in providing such instructions etc.?
If updating the Twiki is required, we will provide these kinds of instructions. @syuvivida, could you please send us the Twiki page that needs to be updated and maybe instructions for adding the plots in the DQMGUI? Thanks a lot
@syuvivida
If we want the online shifters to check the pixel GPU clients results, the Tracker group needs to update/implement the instruction twiki page, and also include the plots in the shift page of DQMGUI.
thanks but this doesn't answer the general question. E.g. for this client, does the PF group plan in providing such instructions etc.?
If updating the Twiki is required, we will provide these kinds of instructions. @syuvivida, could you please send us the Twiki page that needs to be updated and maybe instructions for adding the plots in the DQMGUI? Thanks a lot
Hello, I will send you these piece of information by email.
Hello, I will send you these piece of information by email.
can you please keep me in the loop? thx.
DQM/PFTasks (****)
@waredjeb you also need to make a PR to https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign the (DQM) category.
DQM/PFTasks (****)
@waredjeb you also need to make a PR to https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign the (DQM) category.
Sure, I was waiting for the merge of this PR.
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-47eeef/39687/summary.html
COMMIT: b9a0564229c067d9efba398f21731d71cd9c3d6b
CMSSW: CMSSW_14_1_X_2024-06-04-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45079/39687/install.sh to create a dev area with all the needed externals and cmssw changes.
Comparison Summary
Summary:
- You potentially removed 2 lines from the logs
- Reco comparison results: 8 differences found in the comparisons
- DQMHistoTests: Total files compared: 48
- DQMHistoTests: Total histograms compared: 3338862
- DQMHistoTests: Total failures: 6
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3338836
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
- Checked 202 log files, 165 edm output root files, 48 DQM output files
- TriggerResults: no differences found
Sure, I was waiting for the merge of this PR.
my understanding is that it needs to be done before merging the PR, so that the corresponding L2 maintainers of the new package can sign-off on that too.