cmssw
cmssw copied to clipboard
PFCandidate no longer inherits from CompositeCandidate
PR description:
The storing of PFCandidates takes 1/3rd of the entire time it takes to write AOD. I have started to look at the class to see if there was a way to make it easier for ROOT to store. The first think I tried to was to remove CompositeCandidate (as that looks hard to store) in order to see what would break and therefore would have to be changed if I removed the inheritance. To my surprise no code in CMSSW made use of the possible compositeness of the class.
PR validation:
The Code and all dependent code compiles.
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38999/31468
- This PR adds an extra 32KB to repository
A new Pull Request was created by @Dr15Jones (Chris Jones) for master.
It involves the following packages:
- DataFormats/ParticleFlowCandidate (reconstruction)
- DataFormats/PatCandidates (reconstruction)
@jpata, @cmsbuild, @clacaputo can you please review it and eventually sign? Thanks. @gouskos, @rovere, @lgray, @missirol, @hatakeyamak, @gpetruc this is something you requested to watch as well. @perrotta, @dpiparo, @qliphy, @rappoccio you are the release manager for this.
cms-bot commands are listed here
please test
I did a test where I wrote a 10,000 event file containing only recoPFCandidates_particleFlow__RECO
using standard AOD settings. If I wrote the file with no compression, this PR made the output module run 40% faster. If I used the standard AOD compression, the times were the same within measurement error (just 1% difference).
-1
Failed Tests: UnitTests RelVals RelVals-INPUT AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d5d55e/26710/summary.html
COMMIT: 85e05422811d60165036aaa5d682a422cb506402
CMSSW: CMSSW_12_5_X_2022-08-08-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/38999/26710/install.sh
to create a dev area with all the needed externals and cmssw changes.
Unit Tests
I found errors in the following unit tests:
---> test runtestTqafTopJetCombination had ERRORS ---> test runtestTqafTopEventSelection had ERRORS ---> test runtestTqafTopTools had ERRORS ---> test runtestTqafTopHitFit had ERRORS and more ...
RelVals
-
136.8311
136.8311_RunJetHT2017F_reminiaod+RunJetHT2017F_reminiaod+REMINIAOD_data2017+HARVEST2017_REMINIAOD_data2017/step2_RunJetHT2017F_reminiaod+RunJetHT2017F_reminiaod+REMINIAOD_data2017+HARVEST2017_REMINIAOD_data2017.log
-
136.7611
136.7611_RunJetHT2016E_reminiaod+RunJetHT2016E_reminiaod+REMINIAOD_data2016_HIPM+HARVESTDR2_REMINIAOD_data2016_HIPM/step2_RunJetHT2016E_reminiaod+RunJetHT2016E_reminiaod+REMINIAOD_data2016_HIPM+HARVESTDR2_REMINIAOD_data2016_HIPM.log
-
136.88811
136.88811_RunJetHT2018D_reminiaodUL+RunJetHT2018D_reminiaodUL+REMINIAOD_data2018UL+HARVEST2018_REMINIAOD_data2018UL/step2_RunJetHT2018D_reminiaodUL+RunJetHT2018D_reminiaodUL+REMINIAOD_data2018UL+HARVEST2018_REMINIAOD_data2018UL.log
Expand to see more relval errors ...
RelVals-INPUT
-
136.72411
136.72411_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMINIAOD_data2016UL_HIPM+HARVESTDR2_REMINIAOD_data2016UL_HIPM/step2_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMINIAOD_data2016UL_HIPM+HARVESTDR2_REMINIAOD_data2016UL_HIPM.log
-
136.72412
136.72412_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMININANO_data2016UL_HIPM+HARVESTDR2_REMININANO_data2016UL_HIPM/step2_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMININANO_data2016UL_HIPM+HARVESTDR2_REMININANO_data2016UL_HIPM.log
-
136.7611
136.7611_RunJetHT2016E_reminiaod+RunJetHT2016E_reminiaod+REMINIAOD_data2016_HIPM+HARVESTDR2_REMINIAOD_data2016_HIPM/step2_RunJetHT2016E_reminiaod+RunJetHT2016E_reminiaod+REMINIAOD_data2016_HIPM+HARVESTDR2_REMINIAOD_data2016_HIPM.log
Expand to see more relval errors ...
AddOn Tests
-
pat1
cmsRun /data/cmsbld/jenkins/workspace/ib-run-pr-addon/CMSSW_12_5_X_2022-08-08-1100/src/PhysicsTools/PatAlgos/test/IntegrationTest_cfg.py : FAILED - time: date Mon Aug 8 21:01:15 2022-date Mon Aug 8 20:59:52 2022 s - exit: 35584
@pcanal So I changed the inheritance for reco::PFCandidate
, dropping one of the intermediate base classes. When I did that I updated all the class versions as well. However, now the tests run for the PR are failing where all these tests seem to be reading back reco::PFCandidate
s which had been stored using the previous version. It seems like the scheme evolution isn't working properly.
So I wrote a simple EDAnalyzer which dumps the contents of a std::vector<reco::PFCandidate>
. I ran it using both the original code and the new code, both reading the same input file (which was written using the original format). The results are
element 0 PFCandidate type: 1 E/pT/eta/phi 0.811/0.175/-2.196/-0.403, blocks/ie | element 0 PFCandidate type: 1 E/pT/eta/phi 0.000/0.175/-2.196/-0.403, blocks/ie
element 1 PFCandidate type: 1 E/pT/eta/phi 1.500/0.230/2.559/2.968, blocks/iele | element 1 PFCandidate type: 1 E/pT/eta/phi 0.000/0.230/2.559/2.968, blocks/iele
element 2 PFCandidate type: 1 E/pT/eta/phi 0.589/0.180/1.825/-2.992, blocks/iel | element 2 PFCandidate type: 1 E/pT/eta/phi 0.000/0.180/1.825/-2.992, blocks/iel
element 3 PFCandidate type: 1 E/pT/eta/phi 1.244/0.205/2.484/-3.059, blocks/iel | element 3 PFCandidate type: 1 E/pT/eta/phi 0.000/0.205/2.484/-3.059, blocks/iel
element 4 PFCandidate type: 1 E/pT/eta/phi 9.328/2.029/-2.206/-1.708, blocks/ie | element 4 PFCandidate type: 1 E/pT/eta/phi 0.000/2.029/-2.206/-1.708, blocks/ie
element 5 PFCandidate type: 1 E/pT/eta/phi 2.963/0.630/-2.229/-0.503, blocks/ie | element 5 PFCandidate type: 1 E/pT/eta/phi 0.000/0.630/-2.229/-0.503, blocks/ie
So when reading back using the new format, energy is always 0 while pT, eta, and phi look fine.
Which data member does the energy and pT (or eta or phi) correspond to?
So PFCandidate::energy()
is read from LeafCandidate::m_state:: p4Cartesian_
where that member variable is declared transient to ROOT.
https://github.com/cms-sw/cmssw/blob/e52f99a59bb77ceec0102769446b26c321662ae8/DataFormats/Candidate/src/classes_def.xml#L7
the value is supposed to be filled by the IO rule https://github.com/cms-sw/cmssw/blob/e52f99a59bb77ceec0102769446b26c321662ae8/DataFormats/Candidate/src/classes_def.xml#L9-L12
So it appears that when schema evolution is being used, that the necessary IO rules are not always being run.
So pT, eta and phi come from LeafCandidate::m_state::p4Polar_
which is stored.
So if I do
[cdj@cmslpc-c8-heavy01 matrix]$ root -l root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/data/Run2017F/JetHT/AOD/17Nov2017-v1/70000/0054D144-36DF-E711-B074-02163E01A47B.root
root [1] Events->Scan("recoPFCandidates_particleFlow__RECO.obj.m_state.energy()")
using the original format I get
***********************************
* Row * Instance * recoPFCan *
***********************************
* 0 * 0 * 0.8107478 *
* 0 * 1 * 1.5003893 *
* 0 * 2 * 0.5893012 *
* 0 * 3 * 1.2443912 *
* 0 * 4 * 9.3277593 *
* 0 * 5 * 2.9629352 *
* 0 * 6 * 3.7514724 *
* 0 * 7 * 2.8305396 *
* 0 * 8 * 1.6818398 *
* 0 * 9 * 1.7144627 *
* 0 * 10 * 0.9533503 *
* 0 * 11 * 7.2343502 *
* 0 * 12 * 5.6262460 *
* 0 * 13 * 0.9747063 *
* 0 * 14 * 9.3023527 *
* 0 * 15 * 1.4282982 *
* 0 * 16 * 4.3854562 *
* 0 * 17 * 1.2034772 *
* 0 * 18 * 1.9004409 *
* 0 * 19 * 1.6507877 *
* 0 * 20 * 1.9658588 *
* 0 * 21 * 1.9287844 *
* 0 * 22 * 2.9119759 *
* 0 * 23 * 0.7939654 *
* 0 * 24 * 0.6640725 *
but using the new format I get
***********************************
* Row * Instance * recoPFCan *
***********************************
* 0 * 0 * 0 *
* 0 * 1 * 0 *
* 0 * 2 * 2.84e-306 *
* 0 * 3 * 0 *
* 0 * 4 * 0 *
* 0 * 5 * 0 *
* 0 * 6 * 0 *
* 0 * 7 * 0 *
* 0 * 8 * 2.84e-306 *
* 0 * 9 * 2.84e-306 *
* 0 * 10 * 2.84e-306 *
* 0 * 11 * 0 *
* 0 * 12 * 0 *
* 0 * 13 * 2.84e-306 *
* 0 * 14 * 0 *
* 0 * 15 * 2.84e-306 *
* 0 * 16 * 0 *
* 0 * 17 * 2.84e-306 *
* 0 * 18 * 0 *
* 0 * 19 * 2.84e-306 *
* 0 * 20 * 0 *
* 0 * 21 * 2.84e-306 *
* 0 * 22 * 0 *
* 0 * 23 * 3.72e-307 *
* 0 * 24 * 2.84e-306 *
I'm doing this using CMSSW_12_5_0_pre4 which is using a version of ROOT 6.24.07
Fair enough. It seems to have to do with the handling/passing-through to the base class. Let me try to write a standalone reproducer.
I can reproduce locally. I am looking for a solution.
@pcanal @Dr15Jones Just wondering if there's any news here
I am making steady progress towards the fix.
@pcanal @Dr15Jones Any news on this?
Pull request #38999 was updated. @simonepigazzini, @mandrenguyen, @clacaputo, @vlimant can you please check and sign again.
-reconstruction Cleaning the reco queue
-1
Failed Tests: UnitTests RelVals RelVals-INPUT AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d5d55e/35338/summary.html
COMMIT: 85e05422811d60165036aaa5d682a422cb506402
CMSSW: CMSSW_13_3_X_2023-10-22-0000/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38999/35338/install.sh
to create a dev area with all the needed externals and cmssw changes.
Unit Tests
I found 8 errors in the following unit tests:
---> test runtestPhysicsToolsPatAlgos had ERRORS ---> test runtestTqafTopEventSelection had ERRORS ---> test runtestTqafTopEventProducers had ERRORS and more ...
RelVals
-
136.88811
136.88811_RunJetHT2018DreMINIAODUL/step2_RunJetHT2018DreMINIAODUL.log
-
136.8311
136.8311_RunJetHT2017FreMINIAOD/step2_RunJetHT2017FreMINIAOD.log
-
136.7611
136.7611_RunJetHT2016EreMINIAOD/step2_RunJetHT2016EreMINIAOD.log
RelVals-INPUT
-
136.72411
136.72411_RunJetHT2016B_reminiaodUL/step2_RunJetHT2016B_reminiaodUL.log
-
136.72412
136.72412_RunJetHT2016B_reminiaodUL/step2_RunJetHT2016B_reminiaodUL.log
-
136.7611
136.7611_RunJetHT2016EreMINIAOD/step2_RunJetHT2016EreMINIAOD.log
Expand to see more relval errors ...
AddOn Tests
[pat1:1] cmsRun /data/cmsbld/jenkins/workspace/ib-run-pr-addon/CMSSW_13_3_X_2023-10-22-0000/src/PhysicsTools/PatAlgos/test/IntegrationTest_cfg.py : FAILED - elapsed time: 48 sec (ended on Sun Oct 22 23:42:46 2023) - exit: 35584
Milestone for this pull request has been moved to CMSSW_14_0_X.Please open a backport if it should also go in to CMSSW_13_3_X.
@pcanal was this problem solved?
Milestone for this pull request has been moved to CMSSW_14_1_X. Please open a backport if it should also go in to CMSSW_14_0_X.
-xpog can we close this and reopen when necessary ?