cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

PFCandidate no longer inherits from CompositeCandidate

Open Dr15Jones opened this issue 2 years ago • 14 comments

PR description:

The storing of PFCandidates takes 1/3rd of the entire time it takes to write AOD. I have started to look at the class to see if there was a way to make it easier for ROOT to store. The first think I tried to was to remove CompositeCandidate (as that looks hard to store) in order to see what would break and therefore would have to be changed if I removed the inheritance. To my surprise no code in CMSSW made use of the possible compositeness of the class.

PR validation:

The Code and all dependent code compiles.

Dr15Jones avatar Aug 08 '22 17:08 Dr15Jones

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38999/31468

  • This PR adds an extra 32KB to repository

cmsbuild avatar Aug 08 '22 17:08 cmsbuild

A new Pull Request was created by @Dr15Jones (Chris Jones) for master.

It involves the following packages:

  • DataFormats/ParticleFlowCandidate (reconstruction)
  • DataFormats/PatCandidates (reconstruction)

@jpata, @cmsbuild, @clacaputo can you please review it and eventually sign? Thanks. @gouskos, @rovere, @lgray, @missirol, @hatakeyamak, @gpetruc this is something you requested to watch as well. @perrotta, @dpiparo, @qliphy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

cmsbuild avatar Aug 08 '22 17:08 cmsbuild

please test

Dr15Jones avatar Aug 08 '22 18:08 Dr15Jones

I did a test where I wrote a 10,000 event file containing only recoPFCandidates_particleFlow__RECO using standard AOD settings. If I wrote the file with no compression, this PR made the output module run 40% faster. If I used the standard AOD compression, the times were the same within measurement error (just 1% difference).

Dr15Jones avatar Aug 08 '22 21:08 Dr15Jones

-1

Failed Tests: UnitTests RelVals RelVals-INPUT AddOn Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d5d55e/26710/summary.html COMMIT: 85e05422811d60165036aaa5d682a422cb506402 CMSSW: CMSSW_12_5_X_2022-08-08-1100/el8_amd64_gcc10 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/38999/26710/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test runtestTqafTopJetCombination had ERRORS
---> test runtestTqafTopEventSelection had ERRORS
---> test runtestTqafTopTools had ERRORS
---> test runtestTqafTopHitFit had ERRORS
and more ...

RelVals

  • 136.8311136.8311_RunJetHT2017F_reminiaod+RunJetHT2017F_reminiaod+REMINIAOD_data2017+HARVEST2017_REMINIAOD_data2017/step2_RunJetHT2017F_reminiaod+RunJetHT2017F_reminiaod+REMINIAOD_data2017+HARVEST2017_REMINIAOD_data2017.log
  • 136.7611136.7611_RunJetHT2016E_reminiaod+RunJetHT2016E_reminiaod+REMINIAOD_data2016_HIPM+HARVESTDR2_REMINIAOD_data2016_HIPM/step2_RunJetHT2016E_reminiaod+RunJetHT2016E_reminiaod+REMINIAOD_data2016_HIPM+HARVESTDR2_REMINIAOD_data2016_HIPM.log
  • 136.88811136.88811_RunJetHT2018D_reminiaodUL+RunJetHT2018D_reminiaodUL+REMINIAOD_data2018UL+HARVEST2018_REMINIAOD_data2018UL/step2_RunJetHT2018D_reminiaodUL+RunJetHT2018D_reminiaodUL+REMINIAOD_data2018UL+HARVEST2018_REMINIAOD_data2018UL.log
Expand to see more relval errors ...

RelVals-INPUT

  • 136.72411136.72411_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMINIAOD_data2016UL_HIPM+HARVESTDR2_REMINIAOD_data2016UL_HIPM/step2_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMINIAOD_data2016UL_HIPM+HARVESTDR2_REMINIAOD_data2016UL_HIPM.log
  • 136.72412136.72412_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMININANO_data2016UL_HIPM+HARVESTDR2_REMININANO_data2016UL_HIPM/step2_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMININANO_data2016UL_HIPM+HARVESTDR2_REMININANO_data2016UL_HIPM.log
  • 136.7611136.7611_RunJetHT2016E_reminiaod+RunJetHT2016E_reminiaod+REMINIAOD_data2016_HIPM+HARVESTDR2_REMINIAOD_data2016_HIPM/step2_RunJetHT2016E_reminiaod+RunJetHT2016E_reminiaod+REMINIAOD_data2016_HIPM+HARVESTDR2_REMINIAOD_data2016_HIPM.log
Expand to see more relval errors ...

AddOn Tests

  • pat1cmsRun /data/cmsbld/jenkins/workspace/ib-run-pr-addon/CMSSW_12_5_X_2022-08-08-1100/src/PhysicsTools/PatAlgos/test/IntegrationTest_cfg.py : FAILED - time: date Mon Aug 8 21:01:15 2022-date Mon Aug 8 20:59:52 2022 s - exit: 35584

cmsbuild avatar Aug 08 '22 21:08 cmsbuild

@pcanal So I changed the inheritance for reco::PFCandidate, dropping one of the intermediate base classes. When I did that I updated all the class versions as well. However, now the tests run for the PR are failing where all these tests seem to be reading back reco::PFCandidates which had been stored using the previous version. It seems like the scheme evolution isn't working properly.

Dr15Jones avatar Aug 09 '22 14:08 Dr15Jones

So I wrote a simple EDAnalyzer which dumps the contents of a std::vector<reco::PFCandidate>. I ran it using both the original code and the new code, both reading the same input file (which was written using the original format). The results are

element 0	PFCandidate type: 1 E/pT/eta/phi 0.811/0.175/-2.196/-0.403, blocks/ie |	element 0	PFCandidate type: 1 E/pT/eta/phi 0.000/0.175/-2.196/-0.403, blocks/ie
element 1	PFCandidate type: 1 E/pT/eta/phi 1.500/0.230/2.559/2.968, blocks/iele |	element 1	PFCandidate type: 1 E/pT/eta/phi 0.000/0.230/2.559/2.968, blocks/iele
element 2	PFCandidate type: 1 E/pT/eta/phi 0.589/0.180/1.825/-2.992, blocks/iel |	element 2	PFCandidate type: 1 E/pT/eta/phi 0.000/0.180/1.825/-2.992, blocks/iel
element 3	PFCandidate type: 1 E/pT/eta/phi 1.244/0.205/2.484/-3.059, blocks/iel |	element 3	PFCandidate type: 1 E/pT/eta/phi 0.000/0.205/2.484/-3.059, blocks/iel
element 4	PFCandidate type: 1 E/pT/eta/phi 9.328/2.029/-2.206/-1.708, blocks/ie |	element 4	PFCandidate type: 1 E/pT/eta/phi 0.000/2.029/-2.206/-1.708, blocks/ie
element 5	PFCandidate type: 1 E/pT/eta/phi 2.963/0.630/-2.229/-0.503, blocks/ie |	element 5	PFCandidate type: 1 E/pT/eta/phi 0.000/0.630/-2.229/-0.503, blocks/ie

So when reading back using the new format, energy is always 0 while pT, eta, and phi look fine.

Dr15Jones avatar Aug 09 '22 16:08 Dr15Jones

Which data member does the energy and pT (or eta or phi) correspond to?

pcanal avatar Aug 09 '22 16:08 pcanal

So PFCandidate::energy() is read from LeafCandidate::m_state:: p4Cartesian_ where that member variable is declared transient to ROOT.

https://github.com/cms-sw/cmssw/blob/e52f99a59bb77ceec0102769446b26c321662ae8/DataFormats/Candidate/src/classes_def.xml#L7

the value is supposed to be filled by the IO rule https://github.com/cms-sw/cmssw/blob/e52f99a59bb77ceec0102769446b26c321662ae8/DataFormats/Candidate/src/classes_def.xml#L9-L12

So it appears that when schema evolution is being used, that the necessary IO rules are not always being run.

Dr15Jones avatar Aug 09 '22 16:08 Dr15Jones

So pT, eta and phi come from LeafCandidate::m_state::p4Polar_ which is stored.

Dr15Jones avatar Aug 09 '22 16:08 Dr15Jones

So if I do

[cdj@cmslpc-c8-heavy01 matrix]$ root -l root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/data/Run2017F/JetHT/AOD/17Nov2017-v1/70000/0054D144-36DF-E711-B074-02163E01A47B.root
root [1] Events->Scan("recoPFCandidates_particleFlow__RECO.obj.m_state.energy()")

using the original format I get

***********************************
*    Row   * Instance * recoPFCan *
***********************************
*        0 *        0 * 0.8107478 *
*        0 *        1 * 1.5003893 *
*        0 *        2 * 0.5893012 *
*        0 *        3 * 1.2443912 *
*        0 *        4 * 9.3277593 *
*        0 *        5 * 2.9629352 *
*        0 *        6 * 3.7514724 *
*        0 *        7 * 2.8305396 *
*        0 *        8 * 1.6818398 *
*        0 *        9 * 1.7144627 *
*        0 *       10 * 0.9533503 *
*        0 *       11 * 7.2343502 *
*        0 *       12 * 5.6262460 *
*        0 *       13 * 0.9747063 *
*        0 *       14 * 9.3023527 *
*        0 *       15 * 1.4282982 *
*        0 *       16 * 4.3854562 *
*        0 *       17 * 1.2034772 *
*        0 *       18 * 1.9004409 *
*        0 *       19 * 1.6507877 *
*        0 *       20 * 1.9658588 *
*        0 *       21 * 1.9287844 *
*        0 *       22 * 2.9119759 *
*        0 *       23 * 0.7939654 *
*        0 *       24 * 0.6640725 *

but using the new format I get

***********************************
*    Row   * Instance * recoPFCan *
***********************************
*        0 *        0 *         0 *
*        0 *        1 *         0 *
*        0 *        2 * 2.84e-306 *
*        0 *        3 *         0 *
*        0 *        4 *         0 *
*        0 *        5 *         0 *
*        0 *        6 *         0 *
*        0 *        7 *         0 *
*        0 *        8 * 2.84e-306 *
*        0 *        9 * 2.84e-306 *
*        0 *       10 * 2.84e-306 *
*        0 *       11 *         0 *
*        0 *       12 *         0 *
*        0 *       13 * 2.84e-306 *
*        0 *       14 *         0 *
*        0 *       15 * 2.84e-306 *
*        0 *       16 *         0 *
*        0 *       17 * 2.84e-306 *
*        0 *       18 *         0 *
*        0 *       19 * 2.84e-306 *
*        0 *       20 *         0 *
*        0 *       21 * 2.84e-306 *
*        0 *       22 *         0 *
*        0 *       23 * 3.72e-307 *
*        0 *       24 * 2.84e-306 *

Dr15Jones avatar Aug 09 '22 16:08 Dr15Jones

I'm doing this using CMSSW_12_5_0_pre4 which is using a version of ROOT 6.24.07

Dr15Jones avatar Aug 09 '22 16:08 Dr15Jones

Fair enough. It seems to have to do with the handling/passing-through to the base class. Let me try to write a standalone reproducer.

pcanal avatar Aug 09 '22 16:08 pcanal

I can reproduce locally. I am looking for a solution.

pcanal avatar Aug 10 '22 16:08 pcanal

@pcanal @Dr15Jones Just wondering if there's any news here

mandrenguyen avatar Sep 15 '22 13:09 mandrenguyen

I am making steady progress towards the fix.

pcanal avatar Sep 15 '22 15:09 pcanal

@pcanal @Dr15Jones Any news on this?

mandrenguyen avatar Aug 17 '23 21:08 mandrenguyen

Pull request #38999 was updated. @simonepigazzini, @mandrenguyen, @clacaputo, @vlimant can you please check and sign again.

cmsbuild avatar Aug 17 '23 21:08 cmsbuild

-reconstruction Cleaning the reco queue

mandrenguyen avatar Oct 22 '23 20:10 mandrenguyen

-1

Failed Tests: UnitTests RelVals RelVals-INPUT AddOn Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d5d55e/35338/summary.html COMMIT: 85e05422811d60165036aaa5d682a422cb506402 CMSSW: CMSSW_13_3_X_2023-10-22-0000/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38999/35338/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 8 errors in the following unit tests:

---> test runtestPhysicsToolsPatAlgos had ERRORS
---> test runtestTqafTopEventSelection had ERRORS
---> test runtestTqafTopEventProducers had ERRORS
and more ...

RelVals

  • 136.88811136.88811_RunJetHT2018DreMINIAODUL/step2_RunJetHT2018DreMINIAODUL.log
  • 136.8311136.8311_RunJetHT2017FreMINIAOD/step2_RunJetHT2017FreMINIAOD.log
  • 136.7611136.7611_RunJetHT2016EreMINIAOD/step2_RunJetHT2016EreMINIAOD.log
Expand to see more relval errors ...

RelVals-INPUT

  • 136.72411136.72411_RunJetHT2016B_reminiaodUL/step2_RunJetHT2016B_reminiaodUL.log
  • 136.72412136.72412_RunJetHT2016B_reminiaodUL/step2_RunJetHT2016B_reminiaodUL.log
  • 136.7611136.7611_RunJetHT2016EreMINIAOD/step2_RunJetHT2016EreMINIAOD.log
Expand to see more relval errors ...

AddOn Tests

[pat1:1] cmsRun /data/cmsbld/jenkins/workspace/ib-run-pr-addon/CMSSW_13_3_X_2023-10-22-0000/src/PhysicsTools/PatAlgos/test/IntegrationTest_cfg.py : FAILED - elapsed time: 48 sec (ended on Sun Oct 22 23:42:46 2023) - exit: 35584

cmsbuild avatar Oct 22 '23 22:10 cmsbuild

Milestone for this pull request has been moved to CMSSW_14_0_X.Please open a backport if it should also go in to CMSSW_13_3_X.

smuzaffar avatar Nov 06 '23 16:11 smuzaffar

@pcanal was this problem solved?

Dr15Jones avatar Nov 20 '23 19:11 Dr15Jones

Milestone for this pull request has been moved to CMSSW_14_1_X. Please open a backport if it should also go in to CMSSW_14_0_X.

cmsbuild avatar Feb 06 '24 10:02 cmsbuild

-xpog can we close this and reopen when necessary ?

vlimant avatar Mar 29 '24 12:03 vlimant