cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

updated HLT paths for EXODisappTrk skim

Open carriganm95 opened this issue 3 years ago • 8 comments

PR description:

This PR changes the HLT paths for the EXO disappearing tracks skim. Previously the HLT path MC_PFMET was used for testing MC data. We are now removing this path and adding the HLT paths we plan to use for run 3. They are:

  "HLT_PFMET*_PFMHT*_IDTight_v*",
  "HLT_PFMETTypeOne*_PFMHT*_IDTight_v*",
  "HLT_PFMETNoMu*_PFMHTNoMu*_IDTight_v*", 
  "HLT_MET*_IsoTrk*_v*", 
  "HLT_PFMET*_*Cleaned_v*", 
  "HLT_Ele*_WPTight_Gsf_v*", 
  "HLT_Ele*_WPLoose_Gsf_v*", 
  "HLT_IsoMu*_v*", 
  "HLT_IsoMu*_TightChargedIsoPFTauHPS*_Trk1_eta2p1_SingleL1_v*", 
  "HLT_IsoMu*_eta2p1_TightChargedIsoPFTauHPS*_eta2p1_CrossL1_v*",
  "HLT_IsoMu*_MediumChargedIsoPFTauHPS20_Trk1_eta2p1_SingleL1_v*", 
  "HLT_MediumChargedIsoPFTau*HighPtRelaxedIso_Trk50_eta2p1_v*", 
  "HLT_VBF_DoubleMediumChargedIsoPFTauHPS20_Trk1_eta2p1_v*",
  "HLT_DoubleMediumDeepTauIsoPFTauHPS*_L2NN_eta2p1_v*",
  "HLT_DoubleMediumChargedIsoPFTauHPS*_Trk1_eta2p1_v*"

PR validation:

Ran test script over Run2022C data to check output file size. Used EGamma, SingleMuon, Tau, and MET datasets for testing because these are the datasets used by the disappearing tracks analysis.

MET Total Events: 136898 Total Size: 2.0 GB Number of Files: 91 Average Size: 21.9 MB Dataset Size: 568.65 GB % Size Saved: 0.35%

Tau Total Events: 91896 Total Size: 1.2 GB Number of Files: 91 Average Size: 13.2 MB Dataset Size: 210.15 GB % Size Saved: 0.57%

SingleMuon Total Events: 1828792 Total Size: 7.9 GB Number of Files: 84 Average Size: 96.0 MB Dataset Size: 2436.23 GB % Size Saved: 0.32%

EGamma Total Events: 1715084 Total Size: 12.1 GB Number of Files: 82 Average Size: 151.7 MB Dataset Size: 6280.43 GB % Size Saved: 0.19%

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

carriganm95 avatar Aug 01 '22 14:08 carriganm95

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38919/31366

  • This PR adds an extra 12KB to repository

cmsbuild avatar Aug 01 '22 14:08 cmsbuild

A new Pull Request was created by @carriganm95 for master.

It involves the following packages:

  • Configuration/Skimming (pdmv)

@cmsbuild, @bbilin, @kskovpen, @jordan-martins can you please review it and eventually sign? Thanks. @Martin-Grunewald, @missirol, @fabiocos this is something you requested to watch as well. @perrotta, @dpiparo, @qliphy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

cmsbuild avatar Aug 01 '22 14:08 cmsbuild

please test

kskovpen avatar Aug 01 '22 14:08 kskovpen

@carriganm95

One question from HLT (just for my own education): how is this list of triggers decided?

For example, the latest HLT menu also includes new triggers like HLT_VBF_DoubleMediumDeepTauPFTauHPS20_eta2p1_v* and HLT_PFMETNoMu*_PFMHTNoMu*_IDTight_FilterHF_v*, but those aren't included in the current list.

missirol avatar Aug 01 '22 15:08 missirol

@carriganm95

One question from HLT (just for my own education): how is this list of triggers decided?

For example, the latest HLT menu also includes new triggers like HLT_VBF_DoubleMediumDeepTauPFTauHPS20_eta2p1_v* and HLT_PFMETNoMu*_PFMHTNoMu*_IDTight_FilterHF_v*, but those aren't included in the current list.

For the MET HLT we did not include it because it was not used in the run 2 analysis. I think you are right that it could be useful to us and so we can add it in.

For the tau HLT we used to use HLT_MediumChargedIsoPFTau50_Trk30_eta2p1_1pr_v* and this was removed from the list in run 3. Instead we have added the above triggers because we are unsure what will work for us in the end. Again I think adding the trigger you suggested is a good idea. We only use the taus as a control sample to estimate the probability that a non reconstructed tau could pass our offline selections. The triggers we have listed should give us enough taus to accomplish that but having more cannot hurt.

carriganm95 avatar Aug 01 '22 17:08 carriganm95

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-df5988/26568/summary.html COMMIT: a9465cf2141e6e67444db164bba26cecfaa46b23 CMSSW: CMSSW_12_5_X_2022-08-01-1100/el8_amd64_gcc10 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38919/26568/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3669004
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3668974
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 210 log files, 47 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

cmsbuild avatar Aug 01 '22 18:08 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38919/31486

  • This PR adds an extra 16KB to repository

cmsbuild avatar Aug 10 '22 15:08 cmsbuild

Pull request #38919 was updated. @cmsbuild, @bbilin, @kskovpen, @jordan-martins can you please check and sign again.

cmsbuild avatar Aug 10 '22 15:08 cmsbuild

@kskovpen can this be checked again? The updates were following the advice from @missirol

carriganm95 avatar Aug 15 '22 17:08 carriganm95

please test

kskovpen avatar Aug 15 '22 17:08 kskovpen

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-df5988/26835/summary.html COMMIT: 4a908c37b5ccff26a2cd5ac6b76a72a1b9548d78 CMSSW: CMSSW_12_5_X_2022-08-15-1100/el8_amd64_gcc10 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38919/26835/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3692476
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3692446
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 212 log files, 49 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

cmsbuild avatar Aug 15 '22 21:08 cmsbuild

+pdmv

kskovpen avatar Aug 18 '22 07:08 kskovpen

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

cmsbuild avatar Aug 18 '22 07:08 cmsbuild

Wildcards are usually discouraged, in order to avoid unwanted inclusion of already existing or lately included paths in the menu. You made a widespread usage of those wildcards here:

  • Could you please check confirm that they are all justified, and you don't list every HLT path separately because you want to be ready to accept possible new ones if lately added?
  • Could @cms-sw/hlt-l2 (re)confirm that they support tall the usages of wildcards here, instead of listing the paths one by one?

perrotta avatar Aug 18 '22 07:08 perrotta

Could hlt-l2 (re)confirm that they support tall the usages of wildcards here, instead of listing the paths one by one?

Regarding what the plugin HLTHighLevel supports, it does support the use of the * wildcard, and not much more (internally, it uses the utilities from RegexMatch.h). https://github.com/cms-sw/cmssw/blob/36be14bdfb163abfd6553a1bb5f49b2a3e3d48de/HLTrigger/HLTfilters/plugins/HLTHighLevel.cc#L113 https://github.com/cms-sw/cmssw/blob/6d2f66057131baacc2fcbdd203588c41c885b42c/FWCore/Utilities/src/RegexMatch.cc#L37

Regarding the use of wildcards itself ..

Wildcards are usually discouraged, in order to avoid unwanted inclusion of already existing or lately included paths in the menu.

I would have the same concern (for example, HLT_IsoMu* seems too inclusive to me, as I tried to point out in https://github.com/cms-sw/cmssw/pull/38919#discussion_r934785964). On the other hand, it's not really my decision (and I'm not familiar enough with this use case).

missirol avatar Aug 18 '22 07:08 missirol

Either way a change may (or may not) be needed in case new paths are added or old paths are removed. If it is clear that all possible current & future matches are supposed to be skimmed, then I suppose those wildcards are ok.

Martin-Grunewald avatar Aug 18 '22 07:08 Martin-Grunewald

Either way a change may (or may not) be needed in case new paths are added or old paths are removed. If it is clear that all possible current & future matches are supposed to be skimmed, then I suppose those wildcards are ok.

Indeed. Once it is made clear that "all possible current & future matches are supposed to be skimmed" we will merge this PR

perrotta avatar Aug 18 '22 08:08 perrotta

Either way a change may (or may not) be needed in case new paths are added or old paths are removed. If it is clear that all possible current & future matches are supposed to be skimmed, then I suppose those wildcards are ok.

Indeed. Once it is made clear that "all possible current & future matches are supposed to be skimmed" we will merge this PR

@carriganm95 could you please either confirm that all possible current & future matches corresponding to the wildcards as set here are supposed to be skimmed, or modify the PR accordingly?

perrotta avatar Aug 19 '22 05:08 perrotta

@perrotta yes I can confirm that all of the current and future matches are what we want to skim. We have intentionally made this general so that we do not have to go back and reskim later.

carriganm95 avatar Aug 22 '22 13:08 carriganm95

+1

perrotta avatar Aug 22 '22 13:08 perrotta