Add possibility to read triggerbits for Secondary Datasets from the GT
PR description:
Upon request of @cms-sw/ppd-l2, in order to migrate the list of trigger paths that define the Secondary Datasets away from the release and into the GT, in this PR I'm updating the HLTrigger/HLTfilters/plugins/HLTHighLevel.cc module to also accept a "tag label" through which we can read the SecondaryDatased-dedicated triggerbits tag.
This issue is discussed in: https://its.cern.ch/jira/browse/CMSBPH-2
Changes in this PR:
- Update of
HLTHighLevelin 6bfb34d2ac246975831ec86cd00c58a337c1f2c1 - Update of the
ReserveDMuSD configuration in c796cdc5429934e31e13b8ead17efae134a10f90 - Update of the Prompt GT in
autoCondwith an updated candidate that contains the new triggerbits tag with labelSecondaryDatasetTrigger- Note to @cms-sw/alca-l2 we will need a properly versioned Prompt GT to update autoCond
- GT diff: 140X_dataRun3_Prompt_v2 vs 140X_dataRun3_Prompt_Candidate_2024_05_29_08_51_00
The new triggerbit tag is AlCaRecoTriggerBits_SecondaryDataset_v1, with the corresponding Payload Inspector plot:
Currently the
ReserveDmu key is the only one implemented, but this can easily be extended to other SecondaryDatasets if needed
PR validation:
Code compiles + scram b runtests runs fine.
Additionally, I have run the following cmsDriver
cmsDriver.py RECO --conditions 140X_dataRun3_Prompt_v2 \
--datatier RAW-RECO --era Run3 --eventcontent RAW \
--filein "file:/eos/home-f/fbrivio/AlCa/data/ParkingDoubleMuonLowMass1_Run380470/136ee2f3-e230-40a7-952b-d3a7d12c27ce.root" \
--fileout "file:skim_ReserveDMu.root" \
--nThreads 2 --number 200 --scenario pp \
--step SKIM:@ParkingDoubleMuonLowMass0 \
--data --processName PAT
in different configurations and checked the number of events saved in ReserveDMu.root
| Version | Trigger paths | Evts selected |
|---|---|---|
master |
all ReserveDMu paths | 43 |
master |
only 2 paths | 5 |
| --- | --- | --- |
| This PR | all ReserveDMu paths | 43 |
| This PR | only 2 ReserveDMu paths | 5 |
Backport:
Not a backport, but eventually a backport to 14_0_X will be opened in order to deploy this in Tier0 and produce directly the ReserveDMu SecondaryDataset
cms-bot internal usage
Additionally, I have run the following cmsDriver
is there a workflow to test skims (SKIM:@ParkingDoubleMuonLowMass0 in particular) @cms-sw/pdmv-l2 ?
is there a workflow to test skims
apparently 141.114:
141.114 RunParkingDoubleMuonLowMass2023C+HLTDR3_2023+SKIMPARKINGDOUBLEMUONLOWMASS0RUN3_reHLT_2023+HARVESTRUN3_2023 [1]: input from: /ParkingDoubleMuonLowMass0/Run2023C-v1/RAW with run []
[2]: cmsDriver.py step2 --process reHLT -s L1REPACK:Full,HLT:@relval2024 --conditions auto:run3_hlt_relval --data --eventcontent FEVTDEBUGHLT --datatier FEVTDEBUGHLT --era Run3_2023 -n 100
[3]: cmsDriver.py step3 --conditions auto:run3_data_prompt_relval -s RAW2DIGI,L1Reco,RECO,SKIM:ReserveDMu+LogError+LogErrorMonitor,PAT,NANO,DQM:@standardDQM+@miniAODDQM+@nanoAODDQM --datatier RECO,MINIAOD,NANOAOD,DQMIO --eventcontent RECO,MINIAOD,NANOEDMAOD,DQM --data --process reRECO --scenario pp --era Run3_2023 --customise Configuration/DataProcessing/RecoTLR.customisePostEra_Run3 --hltProcess reHLT -n 100
[4]: cmsDriver.py step4 -s HARVESTING:@standardDQM+@miniAODDQM+@nanoAODDQM --conditions auto:run3_data --data --filetype DQM --scenario pp --era Run3_2023 -n 100
1 workflows with 4 steps
test parameters:
- workflow = 141.114
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45092/40412
-
This PR adds an extra 24KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/AlCa/python/autoCond.py modified in PR(s): #44368
A new Pull Request was created by @francescobrivio for master.
It involves the following packages:
- Configuration/AlCa (alca)
- Configuration/Skimming (pdmv)
- HLTrigger/HLTfilters (hlt)
@AdrianoDee, @mmusich, @cmsbuild, @Martin-Grunewald, @saumyaphor4252, @miquork, @consuegs, @sunilUIET, @perrotta can you please review it and eventually sign? Thanks. @mmusich, @youyingli, @Martin-Grunewald, @fabiocos, @silviodonato, @yuanchao, @tocheng, @missirol, @rsreds this is something you requested to watch as well. @antoniovilela, @sextonkennedy, @rappoccio you are the release manager for this.
cms-bot commands are listed here
@cmsbuild, please test
-1
Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1ec3d5/39604/summary.html
COMMIT: c5dd3d6317c2c12bceed1b6e3642c170cfc207ae
CMSSW: CMSSW_14_1_X_2024-05-29-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45092/39604/install.sh to create a dev area with all the needed externals and cmssw changes.
RelVals
----- Begin Fatal Exception 29-May-2024 13:17:45 CEST-----------------------
An exception of category 'NoProductResolverException' occurred while
[0] Processing Event run: 165121 lumi: 62 event: 23609118 stream: 0
[1] Running path 'ReserveDMuPath'
[2] Calling method for module HLTHighLevel/'ReserveDMu'
Exception Message:
No data of type "AlCaRecoTriggerBits" with label "SecondaryDatasetTrigger" in record "AlCaRecoTriggerBitsRcd"
Please add an ESSource or ESProducer to your job which can deliver this data.
----- End Fatal Exception -------------------------------------------------
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45092/40429
-
This PR adds an extra 28KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/AlCa/python/autoCond.py modified in PR(s): #44368
Pull request #45092 was updated. @Martin-Grunewald, @saumyaphor4252, @miquork, @sunilUIET, @perrotta, @mmusich, @cmsbuild, @consuegs, @AdrianoDee can you please check and sign again.
In commit https://github.com/cms-sw/cmssw/pull/45092/commits/a489e25ee62fc6abbe85a317c8777f1d502163bd I updated a few more data GTs with candidate GTs including the new triggerbits tag:
| GT | GT Diff |
|---|---|
run2_data |
GT diff |
run3_data_prompt |
GT diff |
run3_data |
GT diff |
run3_data_PromptAnalysis |
GT diff |
Additionally I pushed to the tag a new IOV starting at run 376421 (first 2024 run) with an updated triggerlist, see PayloadInspector plot here:
@cmsbuild please test
This issue is discussed in: https://its.cern.ch/jira/browse/CMSBPH-2 Update of HLTHighLevel in https://github.com/cms-sw/cmssw/commit/6bfb34d2ac246975831ec86cd00c58a337c1f2c1
based on the feedback in the ticket (quoting):
The SD is exactly like a Skim. Why can't it be run at Tier0? Because Tier0 doesn't support RAW skims. Tier0 runs skims only for prompt reco and it doesn't guarantee that all data is processed especially for Parking.
I am still not sure to see the overall need for this complication. If the post-processing doesn't run at Tier0 is there a strong reason to not just use a completely SD-driven AlCaRecoTriggerBit tag for the GlobalTag used for the post-processing?
For example: one could conceive to take the Prompt Reco GT, substitute the AlCaRecoTriggerBit tag in it with AlCaRecoTriggerBits_SecondaryDataset_v1 and use the resulting GT for the processing? What could possibly be the source of confusion in this scenario? AlCaReco skimming would never ensue from this post-processing step.
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1ec3d5/39623/summary.html
COMMIT: a489e25ee62fc6abbe85a317c8777f1d502163bd
CMSSW: CMSSW_14_1_X_2024-05-29-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45092/39623/install.sh to create a dev area with all the needed externals and cmssw changes.
Comparison Summary
Summary:
- You potentially removed 1 lines from the logs
- Reco comparison results: 9 differences found in the comparisons
- DQMHistoTests: Total files compared: 49
- DQMHistoTests: Total histograms compared: 3428438
- DQMHistoTests: Total failures: 27
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3428391
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
- Checked 206 log files, 169 edm output root files, 49 DQM output files
- TriggerResults: no differences found
I am still not sure to see the overall need for this complication. If the post-processing doesn't run at Tier0 is there a strong reason to not just use a completely SD-driven
AlCaRecoTriggerBittag for the GlobalTag used for the post-processing? For example: one could conceive to take the Prompt Reco GT, substitute theAlCaRecoTriggerBittag in it withAlCaRecoTriggerBits_SecondaryDataset_v1and use the resulting GT for the processing? What could possibly be the source of confusion in this scenario? AlCaReco skimming would never ensue from this post-processing step.
Hi @mmusich , it is a fair assessment. Indeed, we could have a dedicated AlCaRecoTriggerBits just for processing the SD. But the way we are planning things, this SD will end up in the same processing as a regular ReRECO (for the past data). And for the future data taking, we will very likely have it in the PdmV growing dataset, for which we usually use the Prompt GT. So I think this ability of having two separate tags, one for the AlCaRecos and another one for the SDs, is still useful.
So I think this ability of having two separate tags, one for the AlCaRecos and another one for the SDs, is still useful.
OK, thanks for clarifying. I'll stand by for an update from alca to make the full-fledged GT-s before signing off for HLT.
So I think this ability of having two separate tags, one for the AlCaRecos and another one for the SDs, is still useful.
OK, thanks for clarifying. I'll stand by for an update from alca to make the full-fledged GT-s before signing off for HLT.
@cms-sw/alca-l2 a kind ping to please provide the GTs so we can finalize this PR and open the backport. If you prefer I can open a CMSTalk with the official request.
Thanks! Francesco
Here are the official GTs:
140X_dataRun2_v2140X_dataRun3_Prompt_frozen_v3140X_dataRun3_v4140X_dataRun3_PromptAnalysis_v2
The differences from the last GTs are below:
[1] run2_data: https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun2_v1/140X_dataRun2_v2
[2] run3_data_prompt:
- https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_Prompt_v3/140X_dataRun3_Prompt_frozen_v3
- https://cmsconddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_Prompt_frozen_v1/140X_dataRun3_Prompt_frozen_v3
[3] run3_data: https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_v3/140X_dataRun3_v4
[4] run3_data_PromptAnalysis https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_PromptAnalysis_v1/140X_dataRun3_PromptAnalysis_v2
Thanks @saumyaphor4252 !
Just one question, in run3_data_prompt I see actually 2 differences for https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_Prompt_frozen_v1/140X_dataRun3_Prompt_frozen_v3:
Is the change in
DropBoxMetadataRcd expected? (I think yes, but I just want to make sure)
Just one question, in run3_data_prompt I see actually 2 differences for https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_Prompt_frozen_v1/140X_dataRun3_Prompt_frozen_v3:
Yes. The frozen GT update was missed in CMSSW autoCond at the time, but the Prompt GT is correct and the DropBox update is expected.
@cmsbuild please test
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45092/40445
-
This PR adds an extra 24KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/AlCa/python/autoCond.py modified in PR(s): #44368
Pull request #45092 was updated. @perrotta, @AdrianoDee, @sunilUIET, @mmusich, @miquork, @saumyaphor4252, @consuegs, @Martin-Grunewald can you please check and sign again.
-1
Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1ec3d5/39643/summary.html
COMMIT: 3d0b8e990b726d3c00fe1ae336a31a63ec5b5b3b
CMSSW: CMSSW_14_1_X_2024-05-31-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45092/39643/install.sh to create a dev area with all the needed externals and cmssw changes.
RelVals
- 4.53
4.53_RunPhoton2012B/step2_RunPhoton2012B.log - 7.3
7.3_CosmicsSPLoose2018/step1_CosmicsSPLoose2018.log - 8.0
8.0_BeamHalo/step1_BeamHalo.log
Expand to see more relval errors ...
- 9.0
- 25.0
- 136.731
- 136.793
- 139.001
- 136.874
- 140.023
- 4.22
- 140.043
- 140.063
- 5.1
- 135.4
- 1306.0
- 1330.0
- 141.042
- 136.8311
- 141.044
- 136.88811
- 136.7611
- 141.046
- 25202.0
- 312.0
- 10224.0
- 101.0
- 140.56
- 11634.0
- 12434.0
- 12834.0
- 12834.7
- 158.01
- 13034.0
- 12846.0
- 13234.0
- 14034.0
- 14234.0
- 23234.0
- 24834.911
- 24896.0
- 24834.0
- 1001.0
- 24900.0
- 25034.999
- 250202.181
- 1000.0
- 2500.4
- 141.114
-1
Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1ec3d5/39643/summary.html
COMMIT: 3d0b8e990b726d3c00fe1ae336a31a63ec5b5b3b
CMSSW: CMSSW_14_1_X_2024-05-31-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45092/39643/install.sh to create a dev area with all the needed externals and cmssw changes.
RelVals
- 9.0
9.0_Higgs200ChargedTaus/step1_Higgs200ChargedTaus.log - 8.0
8.0_BeamHalo/step1_BeamHalo.log - 4.53
4.53_RunPhoton2012B/step2_RunPhoton2012B.log
Expand to see more relval errors ...
- 7.3
- 25.0
- 136.731
- 136.793
- 139.001
- 136.874
- 140.023
- 4.22
- 140.043
- 140.063
- 5.1
- 135.4
- 141.042
- 1306.0
- 136.8311
- 136.88811
- 136.7611
- 141.044
- 1330.0
- 141.046
- 312.0
- 25202.0
- 10224.0
- 12434.0
- 101.0
- 11634.0
- 12834.7
- 12834.0
- 140.56
- 158.01
- 12846.0
- 13034.0
- 13234.0
- 14034.0
- 23234.0
- 14234.0
- 24834.0
- 25034.999
- 24834.911
- 24900.0
- 24896.0
- 250202.181
- 1000.0
- 1001.0
- 2500.4
- 141.114
@francescobrivio , can you please test it locally? All relval tests were killed [a]
[a]
May 31 14:49:56 cmsbuild154.cern.ch kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-501.slice/session-16223.scope,task=cmsRun,pid=719767,uid=501
May 31 14:49:56 cmsbuild154.cern.ch kernel: Out of memory: Killed process 719767 (cmsRun) total-vm:15913904kB, anon-rss:13228752kB, file-rss:112kB, shmem-rss:0kB, UID:501 pgtables:31168kB oom_score_adj:0
May 31 14:50:04 cmsbuild154.cern.ch kernel: cmsRun invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
May 31 14:50:04 cmsbuild154.cern.ch kernel: CPU: 2 PID: 721127 Comm: cmsRun Kdump: loaded Not tainted 4.18.0-513.5.1.el8_9.x86_64 #1
May 31 14:50:04 cmsbuild154.cern.ch kernel: Hardware name: RDO OpenStack Compute/RHEL-AV, BIOS 0.0.0 02/06/2015
May 31 14:50:04 cmsbuild154.cern.ch kernel: Call Trace:
May 31 14:50:04 cmsbuild154.cern.ch kernel: dump_stack+0x41/0x60
May 31 14:50:04 cmsbuild154.cern.ch kernel: dump_header+0x4a/0x1df
May 31 14:50:04 cmsbuild154.cern.ch kernel: oom_kill_process.cold.33+0xb/0x10
@francescobrivio , can you please test it locally? All relval tests were killed [a]
[a]
May 31 14:49:56 cmsbuild154.cern.ch kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-501.slice/session-16223.scope,task=cmsRun,pid=719767,uid=501 May 31 14:49:56 cmsbuild154.cern.ch kernel: Out of memory: Killed process 719767 (cmsRun) total-vm:15913904kB, anon-rss:13228752kB, file-rss:112kB, shmem-rss:0kB, UID:501 pgtables:31168kB oom_score_adj:0 May 31 14:50:04 cmsbuild154.cern.ch kernel: cmsRun invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 May 31 14:50:04 cmsbuild154.cern.ch kernel: CPU: 2 PID: 721127 Comm: cmsRun Kdump: loaded Not tainted 4.18.0-513.5.1.el8_9.x86_64 #1 May 31 14:50:04 cmsbuild154.cern.ch kernel: Hardware name: RDO OpenStack Compute/RHEL-AV, BIOS 0.0.0 02/06/2015 May 31 14:50:04 cmsbuild154.cern.ch kernel: Call Trace: May 31 14:50:04 cmsbuild154.cern.ch kernel: dump_stack+0x41/0x60 May 31 14:50:04 cmsbuild154.cern.ch kernel: dump_header+0x4a/0x1df May 31 14:50:04 cmsbuild154.cern.ch kernel: oom_kill_process.cold.33+0xb/0x10
Hi @smuzaffar, I have tested this locally on the 2 wfs mentioned in the comments above and I got:
141.114_RunParkingDoubleMuonLowMass2023C Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED - time date Thu May 30 11:30:37 2024-date Thu May 30 10:57:38 2024; exit: 0 0 0 0
1000.0_RunMinBias2011A Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED Step4-PASSED - time date Thu May 30 11:09:08 2024-date Thu May 30 10:57:39 2024; exit: 0 0 0 0 0
2 2 2 2 1 tests passed, 0 0 0 0 0 failed
so at least locally this is working in CMSSW_14_1_X_2024-05-28-2300 (which is what I have installed locally at the moment).
But I see a similar failures in other PRs, e.g. 45114 and 45113 (i don't want to tag them here).
Looks like memory profiler is broken inside el8 container e.g in latest CMSSW 14.1.X dev area (without checkout any thing) the runTheMatrix.py -l 8.0 --command ' --maxmem_profile ' command fails [a] when run inside cmssw-el8 but works when run directly on cmsdev node.
@makortel , I would suggest to disable --maxmem_profile for PR tests for now while I try to understand why memoey profileis failing
> runTheMatrix.py -l 8.0 --command ' --maxmem_profile '
...
...
# in: /build/muz/del/CMSSW_14_1_X_2024-05-31-1100 going to execute cd 8.0_BeamHalo
cmsDriver.py BeamHalo_cfi.py --relval 9000,100 -s GEN,SIM -n 10 --conditions auto:run1_mc --beamspot Realistic8TeVCollision --datatier GEN-SIM --eventcontent RAWSIM --scenario cosmics --maxmem_profile --fileout file:step1.root > step1_BeamHalo.log 2>&1
/bin/sh: line 1: 79586 Segmentation fault (core dumped) cmsDriver.py BeamHalo_cfi.py --relval 9000,100 -s GEN,SIM -n 10 --conditions auto:run1_mc --beamspot Realistic8TeVCollision --datatier GEN-SIM --eventcontent RAWSIM --scenario cosmics --maxmem_
I have opened an issue here https://github.com/cms-sw/cmssw/issues/45116