Add protection against nan inputs for DeepMET
PR description:
This PR addresses the issue reported in #44976 that DeepMET returns nan in a fraction of events (up to 1%) for run 3 samples, caused by nan values in the input packed PF candidate pz() values that appeared in run 3.
It addresses the issue by changing nan inputs to zero; it also adds outlier protection for all floating-point inputs to prevent similar issues in the future.
PR validation:
The PR was tested on some of the events reported in #44976 , and it leads to finite DeepMET output.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
Not a backport, though backports should probably be made to any releases with which run 3 MC and data are going to be processed.
@yongbinfeng @mseidel42
cms-bot internal usage
please test
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44986/40265
- This PR adds an extra 20KB to repository
A new Pull Request was created by @steggema for master.
It involves the following packages:
- RecoMET/METPUSubtraction (reconstruction)
@jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks. @mariadalfonso, @ahinzmann, @nhanvtran, @gouskos, @schoef, @missirol, @gkasieczka, @mmarionncern, @jdolen, @seemasharmafnal, @jdamgov this is something you requested to watch as well. @sextonkennedy, @antoniovilela, @rappoccio you are the release manager for this.
cms-bot commands are listed here
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3ca0aa/39410/summary.html
COMMIT: 5c341f4130b827f966f636b1bf339c741daa92fd
CMSSW: CMSSW_14_1_X_2024-05-15-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/44986/39410/install.sh to create a dev area with all the needed externals and cmssw changes.
Comparison Summary
Summary:
- You potentially added 1 lines to the logs
- Reco comparison results: 6 differences found in the comparisons
- DQMHistoTests: Total files compared: 48
- DQMHistoTests: Total histograms compared: 3338976
- DQMHistoTests: Total failures: 27
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3338929
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
- Checked 202 log files, 165 edm output root files, 48 DQM output files
- TriggerResults: no differences found
There are a few DeepMET-related differences in the NanoAODDQM output, e.g. here https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_14_1_X_2024-05-15-2300+3ca0aa/62803/140.023_RunZeroBias2022B/Physics_NanoAODDQM_DeepMETResolutionTune.html or here https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_14_1_X_2024-05-15-2300+3ca0aa/62803/141.044_RunJetMET2023D/Physics__NanoAODDQM_DeepMETResolutionTune_phi.png
In all cases, these seem to be additional events in the red distribution compared to the blue one, and the number is consistent with the 1% quoted above. This seems to be consistent with recovered "nan" events, but it would be great if someone could confirm that the red lines indeed correspond to the new distributions.
The difference is small, below 1% for the ten events tested in the wfs (blue is baseline, black is new): https://tinyurl.com/24cgxasz https://tinyurl.com/2ckmqaty from https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_14_1_X_2024-05-15-2300+3ca0aa/62803/dqm-histo-comparison-summary.html
Ok great, so there appear to be a very few additional events in the new (black) compared to the baseline (blue), consistent with DeepMET values going from nan to a finite value, and with the effect size (<~ 1%) I would expect.
+1
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)
+1