cmssw Add possibility to read triggerbits for Secondary Datasets from the GT

PR description:

Upon request of @cms-sw/ppd-l2, in order to migrate the list of trigger paths that define the Secondary Datasets away from the release and into the GT, in this PR I'm updating the HLTrigger/HLTfilters/plugins/HLTHighLevel.cc module to also accept a "tag label" through which we can read the SecondaryDatased-dedicated triggerbits tag.

This issue is discussed in: https://its.cern.ch/jira/browse/CMSBPH-2

Changes in this PR:

Update of HLTHighLevel in 6bfb34d2ac246975831ec86cd00c58a337c1f2c1
Update of the ReserveDMu SD configuration in c796cdc5429934e31e13b8ead17efae134a10f90
Update of the Prompt GT in autoCond with an updated candidate that contains the new triggerbits tag with label SecondaryDatasetTrigger
- Note to @cms-sw/alca-l2 we will need a properly versioned Prompt GT to update autoCond
- GT diff: 140X_dataRun3_Prompt_v2 vs 140X_dataRun3_Prompt_Candidate_2024_05_29_08_51_00

The new triggerbit tag is AlCaRecoTriggerBits_SecondaryDataset_v1, with the corresponding Payload Inspector plot: Currently the ReserveDmu key is the only one implemented, but this can easily be extended to other SecondaryDatasets if needed

PR validation:

Code compiles + scram b runtests runs fine. Additionally, I have run the following cmsDriver

cmsDriver.py RECO --conditions 140X_dataRun3_Prompt_v2 \
  --datatier RAW-RECO --era Run3 --eventcontent RAW \
  --filein "file:/eos/home-f/fbrivio/AlCa/data/ParkingDoubleMuonLowMass1_Run380470/136ee2f3-e230-40a7-952b-d3a7d12c27ce.root" \
  --fileout "file:skim_ReserveDMu.root" \
  --nThreads 2 --number 200 --scenario pp \
  --step SKIM:@ParkingDoubleMuonLowMass0 \
  --data --processName PAT

in different configurations and checked the number of events saved in ReserveDMu.root

Version	Trigger paths	Evts selected
`master`	all ReserveDMu paths	43
`master`	only 2 paths	5
---	---	---
This PR	all ReserveDMu paths	43
This PR	only 2 ReserveDMu paths	5

Backport:

Not a backport, but eventually a backport to 14_0_X will be opened in order to deploy this in Tier0 and produce directly the ReserveDMu SecondaryDataset

May 29 '24 09:05 francescobrivio

cms-bot internal usage

May 29 '24 09:05 cmsbuild

Additionally, I have run the following cmsDriver

is there a workflow to test skims (SKIM:@ParkingDoubleMuonLowMass0 in particular) @cms-sw/pdmv-l2 ?

May 29 '24 10:05 mmusich

is there a workflow to test skims

apparently 141.114:

141.114 RunParkingDoubleMuonLowMass2023C+HLTDR3_2023+SKIMPARKINGDOUBLEMUONLOWMASS0RUN3_reHLT_2023+HARVESTRUN3_2023 [1]: input from: /ParkingDoubleMuonLowMass0/Run2023C-v1/RAW with run [] 
                                           [2]: cmsDriver.py step2  --process reHLT -s L1REPACK:Full,HLT:@relval2024 --conditions auto:run3_hlt_relval --data  --eventcontent FEVTDEBUGHLT --datatier FEVTDEBUGHLT --era Run3_2023 -n 100 
                                           [3]: cmsDriver.py step3  --conditions auto:run3_data_prompt_relval -s RAW2DIGI,L1Reco,RECO,SKIM:ReserveDMu+LogError+LogErrorMonitor,PAT,NANO,DQM:@standardDQM+@miniAODDQM+@nanoAODDQM --datatier RECO,MINIAOD,NANOAOD,DQMIO --eventcontent RECO,MINIAOD,NANOEDMAOD,DQM --data  --process reRECO --scenario pp --era Run3_2023 --customise Configuration/DataProcessing/RecoTLR.customisePostEra_Run3 --hltProcess reHLT -n 100 
                                           [4]: cmsDriver.py step4  -s HARVESTING:@standardDQM+@miniAODDQM+@nanoAODDQM --conditions auto:run3_data --data  --filetype DQM --scenario pp --era Run3_2023 -n 100 

1 workflows with 4 steps

May 29 '24 10:05 mmusich

test parameters:

workflow = 141.114

May 29 '24 10:05 mmusich

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45092/40412

This PR adds an extra 24KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/AlCa/python/autoCond.py modified in PR(s): #44368

May 29 '24 10:05 cmsbuild

A new Pull Request was created by @francescobrivio for master.

It involves the following packages:

Configuration/AlCa (alca)
Configuration/Skimming (pdmv)
HLTrigger/HLTfilters (hlt)

@AdrianoDee, @mmusich, @cmsbuild, @Martin-Grunewald, @saumyaphor4252, @miquork, @consuegs, @sunilUIET, @perrotta can you please review it and eventually sign? Thanks. @mmusich, @youyingli, @Martin-Grunewald, @fabiocos, @silviodonato, @yuanchao, @tocheng, @missirol, @rsreds this is something you requested to watch as well. @antoniovilela, @sextonkennedy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

May 29 '24 10:05 cmsbuild

@cmsbuild, please test

May 29 '24 10:05 mmusich

-1

Failed Tests: RelVals Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1ec3d5/39604/summary.html COMMIT: c5dd3d6317c2c12bceed1b6e3642c170cfc207ae CMSSW: CMSSW_14_1_X_2024-05-29-1100/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45092/39604/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

1000.0

----- Begin Fatal Exception 29-May-2024 13:17:45 CEST-----------------------
An exception of category 'NoProductResolverException' occurred while
   [0] Processing  Event run: 165121 lumi: 62 event: 23609118 stream: 0
   [1] Running path 'ReserveDMuPath'
   [2] Calling method for module HLTHighLevel/'ReserveDMu'
Exception Message:
No data of type "AlCaRecoTriggerBits" with label "SecondaryDatasetTrigger" in record "AlCaRecoTriggerBitsRcd"
 Please add an ESSource or ESProducer to your job which can deliver this data.
----- End Fatal Exception -------------------------------------------------

May 29 '24 12:05 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45092/40429

This PR adds an extra 28KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/AlCa/python/autoCond.py modified in PR(s): #44368

May 30 '24 09:05 cmsbuild

Pull request #45092 was updated. @Martin-Grunewald, @saumyaphor4252, @miquork, @sunilUIET, @perrotta, @mmusich, @cmsbuild, @consuegs, @AdrianoDee can you please check and sign again.

May 30 '24 09:05 cmsbuild

In commit https://github.com/cms-sw/cmssw/pull/45092/commits/a489e25ee62fc6abbe85a317c8777f1d502163bd I updated a few more data GTs with candidate GTs including the new triggerbits tag:

GT	GT Diff
`run2_data`	GT diff
`run3_data_prompt`	GT diff
`run3_data`	GT diff
`run3_data_PromptAnalysis`	GT diff

Additionally I pushed to the tag a new IOV starting at run 376421 (first 2024 run) with an updated triggerlist, see PayloadInspector plot here:

May 30 '24 09:05 francescobrivio

@cmsbuild please test

May 30 '24 09:05 francescobrivio

This issue is discussed in: https://its.cern.ch/jira/browse/CMSBPH-2 Update of HLTHighLevel in https://github.com/cms-sw/cmssw/commit/6bfb34d2ac246975831ec86cd00c58a337c1f2c1

based on the feedback in the ticket (quoting):

The SD is exactly like a Skim. Why can't it be run at Tier0? Because Tier0 doesn't support RAW skims. Tier0 runs skims only for prompt reco and it doesn't guarantee that all data is processed especially for Parking.

I am still not sure to see the overall need for this complication. If the post-processing doesn't run at Tier0 is there a strong reason to not just use a completely SD-driven AlCaRecoTriggerBit tag for the GlobalTag used for the post-processing? For example: one could conceive to take the Prompt Reco GT, substitute the AlCaRecoTriggerBit tag in it with AlCaRecoTriggerBits_SecondaryDataset_v1 and use the resulting GT for the processing? What could possibly be the source of confusion in this scenario? AlCaReco skimming would never ensue from this post-processing step.

May 30 '24 10:05 mmusich

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1ec3d5/39623/summary.html COMMIT: a489e25ee62fc6abbe85a317c8777f1d502163bd CMSSW: CMSSW_14_1_X_2024-05-29-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45092/39623/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially removed 1 lines from the logs
Reco comparison results: 9 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3428438
DQMHistoTests: Total failures: 27
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3428391
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
Checked 206 log files, 169 edm output root files, 49 DQM output files
TriggerResults: no differences found

May 30 '24 13:05 cmsbuild

I am still not sure to see the overall need for this complication. If the post-processing doesn't run at Tier0 is there a strong reason to not just use a completely SD-driven AlCaRecoTriggerBit tag for the GlobalTag used for the post-processing? For example: one could conceive to take the Prompt Reco GT, substitute the AlCaRecoTriggerBit tag in it with AlCaRecoTriggerBits_SecondaryDataset_v1 and use the resulting GT for the processing? What could possibly be the source of confusion in this scenario? AlCaReco skimming would never ensue from this post-processing step.

Hi @mmusich , it is a fair assessment. Indeed, we could have a dedicated AlCaRecoTriggerBits just for processing the SD. But the way we are planning things, this SD will end up in the same processing as a regular ReRECO (for the past data). And for the future data taking, we will very likely have it in the PdmV growing dataset, for which we usually use the Prompt GT. So I think this ability of having two separate tags, one for the AlCaRecos and another one for the SDs, is still useful.

May 31 '24 08:05 malbouis

So I think this ability of having two separate tags, one for the AlCaRecos and another one for the SDs, is still useful.

OK, thanks for clarifying. I'll stand by for an update from alca to make the full-fledged GT-s before signing off for HLT.

May 31 '24 08:05 mmusich

So I think this ability of having two separate tags, one for the AlCaRecos and another one for the SDs, is still useful.

OK, thanks for clarifying. I'll stand by for an update from alca to make the full-fledged GT-s before signing off for HLT.

@cms-sw/alca-l2 a kind ping to please provide the GTs so we can finalize this PR and open the backport. If you prefer I can open a CMSTalk with the official request.

Thanks! Francesco

May 31 '24 08:05 francescobrivio

Here are the official GTs:

140X_dataRun2_v2
140X_dataRun3_Prompt_frozen_v3
140X_dataRun3_v4
140X_dataRun3_PromptAnalysis_v2

The differences from the last GTs are below: [1] run2_data: https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun2_v1/140X_dataRun2_v2 [2] run3_data_prompt:

https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_Prompt_v3/140X_dataRun3_Prompt_frozen_v3
https://cmsconddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_Prompt_frozen_v1/140X_dataRun3_Prompt_frozen_v3

[3] run3_data: https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_v3/140X_dataRun3_v4 [4] run3_data_PromptAnalysis https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_PromptAnalysis_v1/140X_dataRun3_PromptAnalysis_v2

May 31 '24 09:05 saumyaphor4252

Thanks @saumyaphor4252 !

Just one question, in run3_data_prompt I see actually 2 differences for https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_Prompt_frozen_v1/140X_dataRun3_Prompt_frozen_v3: Screenshot 2024-05-31 alle 12 13 11 Is the change in DropBoxMetadataRcd expected? (I think yes, but I just want to make sure)

May 31 '24 10:05 francescobrivio

Just one question, in run3_data_prompt I see actually 2 differences for https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/140X_dataRun3_Prompt_frozen_v1/140X_dataRun3_Prompt_frozen_v3:

Yes. The frozen GT update was missed in CMSSW autoCond at the time, but the Prompt GT is correct and the DropBox update is expected.

May 31 '24 10:05 saumyaphor4252

@cmsbuild please test

May 31 '24 10:05 francescobrivio

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45092/40445

This PR adds an extra 24KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/AlCa/python/autoCond.py modified in PR(s): #44368

May 31 '24 10:05 cmsbuild

Pull request #45092 was updated. @perrotta, @AdrianoDee, @sunilUIET, @mmusich, @miquork, @saumyaphor4252, @consuegs, @Martin-Grunewald can you please check and sign again.

May 31 '24 10:05 cmsbuild

-1

Failed Tests: RelVals Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1ec3d5/39643/summary.html COMMIT: 3d0b8e990b726d3c00fe1ae336a31a63ec5b5b3b CMSSW: CMSSW_14_1_X_2024-05-31-1100/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45092/39643/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

4.534.53_RunPhoton2012B/step2_RunPhoton2012B.log
7.37.3_CosmicsSPLoose2018/step1_CosmicsSPLoose2018.log
8.08.0_BeamHalo/step1_BeamHalo.log

Expand to see more relval errors ...

May 31 '24 11:05 cmsbuild

-1

Failed Tests: RelVals Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1ec3d5/39643/summary.html COMMIT: 3d0b8e990b726d3c00fe1ae336a31a63ec5b5b3b CMSSW: CMSSW_14_1_X_2024-05-31-1100/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45092/39643/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

9.09.0_Higgs200ChargedTaus/step1_Higgs200ChargedTaus.log
8.08.0_BeamHalo/step1_BeamHalo.log
4.534.53_RunPhoton2012B/step2_RunPhoton2012B.log

Expand to see more relval errors ...

May 31 '24 13:05 cmsbuild

@francescobrivio , can you please test it locally? All relval tests were killed [a]

[a]

May 31 14:49:56 cmsbuild154.cern.ch kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-501.slice/session-16223.scope,task=cmsRun,pid=719767,uid=501
May 31 14:49:56 cmsbuild154.cern.ch kernel: Out of memory: Killed process 719767 (cmsRun) total-vm:15913904kB, anon-rss:13228752kB, file-rss:112kB, shmem-rss:0kB, UID:501 pgtables:31168kB oom_score_adj:0
May 31 14:50:04 cmsbuild154.cern.ch kernel: cmsRun invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
May 31 14:50:04 cmsbuild154.cern.ch kernel: CPU: 2 PID: 721127 Comm: cmsRun Kdump: loaded Not tainted 4.18.0-513.5.1.el8_9.x86_64 #1
May 31 14:50:04 cmsbuild154.cern.ch kernel: Hardware name: RDO OpenStack Compute/RHEL-AV, BIOS 0.0.0 02/06/2015
May 31 14:50:04 cmsbuild154.cern.ch kernel: Call Trace:
May 31 14:50:04 cmsbuild154.cern.ch kernel: dump_stack+0x41/0x60
May 31 14:50:04 cmsbuild154.cern.ch kernel: dump_header+0x4a/0x1df
May 31 14:50:04 cmsbuild154.cern.ch kernel: oom_kill_process.cold.33+0xb/0x10

May 31 '24 14:05 smuzaffar

@francescobrivio , can you please test it locally? All relval tests were killed [a]

[a]

May 31 14:49:56 cmsbuild154.cern.ch kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-501.slice/session-16223.scope,task=cmsRun,pid=719767,uid=501
May 31 14:49:56 cmsbuild154.cern.ch kernel: Out of memory: Killed process 719767 (cmsRun) total-vm:15913904kB, anon-rss:13228752kB, file-rss:112kB, shmem-rss:0kB, UID:501 pgtables:31168kB oom_score_adj:0
May 31 14:50:04 cmsbuild154.cern.ch kernel: cmsRun invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
May 31 14:50:04 cmsbuild154.cern.ch kernel: CPU: 2 PID: 721127 Comm: cmsRun Kdump: loaded Not tainted 4.18.0-513.5.1.el8_9.x86_64 #1
May 31 14:50:04 cmsbuild154.cern.ch kernel: Hardware name: RDO OpenStack Compute/RHEL-AV, BIOS 0.0.0 02/06/2015
May 31 14:50:04 cmsbuild154.cern.ch kernel: Call Trace:
May 31 14:50:04 cmsbuild154.cern.ch kernel: dump_stack+0x41/0x60
May 31 14:50:04 cmsbuild154.cern.ch kernel: dump_header+0x4a/0x1df
May 31 14:50:04 cmsbuild154.cern.ch kernel: oom_kill_process.cold.33+0xb/0x10

Hi @smuzaffar, I have tested this locally on the 2 wfs mentioned in the comments above and I got:

141.114_RunParkingDoubleMuonLowMass2023C Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu May 30 11:30:37 2024-date Thu May 30 10:57:38 2024; exit: 0 0 0 0
1000.0_RunMinBias2011A Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED Step4-PASSED  - time date Thu May 30 11:09:08 2024-date Thu May 30 10:57:39 2024; exit: 0 0 0 0 0
2 2 2 2 1 tests passed, 0 0 0 0 0 failed

so at least locally this is working in CMSSW_14_1_X_2024-05-28-2300 (which is what I have installed locally at the moment).

May 31 '24 14:05 francescobrivio

But I see a similar failures in other PRs, e.g. 45114 and 45113 (i don't want to tag them here).

May 31 '24 14:05 francescobrivio

Looks like memory profiler is broken inside el8 container e.g in latest CMSSW 14.1.X dev area (without checkout any thing) the runTheMatrix.py -l 8.0 --command ' --maxmem_profile ' command fails [a] when run inside cmssw-el8 but works when run directly on cmsdev node.

@makortel , I would suggest to disable --maxmem_profile for PR tests for now while I try to understand why memoey profileis failing

> runTheMatrix.py -l 8.0 --command '  --maxmem_profile '
...
...
# in: /build/muz/del/CMSSW_14_1_X_2024-05-31-1100 going to execute cd 8.0_BeamHalo
 cmsDriver.py BeamHalo_cfi.py  --relval 9000,100 -s GEN,SIM -n 10 --conditions auto:run1_mc --beamspot Realistic8TeVCollision --datatier GEN-SIM --eventcontent RAWSIM --scenario cosmics   --maxmem_profile  --fileout file:step1.root  > step1_BeamHalo.log  2>&1
 
/bin/sh: line 1: 79586 Segmentation fault      (core dumped) cmsDriver.py BeamHalo_cfi.py --relval 9000,100 -s GEN,SIM -n 10 --conditions auto:run1_mc --beamspot Realistic8TeVCollision --datatier GEN-SIM --eventcontent RAWSIM --scenario cosmics --maxmem_

May 31 '24 14:05 smuzaffar

I have opened an issue here https://github.com/cms-sw/cmssw/issues/45116

May 31 '24 15:05 smuzaffar