cmssw
cmssw copied to clipboard
Initialize BranchDescription from Dictionary only if the branch is present
PR description:
This PR limits the BranchDescription
's ROOT dicrionary part be initialized only if the branch is present. This change allows a job to process a file that contains products whose parent product (that is not in the file, was either transient or dropped) dictionary does not exist anymore in the job. Resolves https://github.com/cms-sw/cmssw/issues/38781.
I'm uncertain about the usefulness of the test added in this PR. It is sufficient to demonstrate that the changes of this PR work as intended, but it has some serious flaws for longer term.
PR validation:
Problematic job runs, framework unit tests pass.
-code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38806/31164
-
This PR adds an extra 68KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File DataFormats/Provenance/interface/BranchDescription.h modified in PR(s): #38454
- File IOPool/Input/src/PoolSource.cc modified in PR(s): #38801
Code check has found code style and quality issues which could be resolved by applying following patch(s)
-
code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38806/31164/code-format.patch
e.g.
curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38806/31164/code-format.patch | patch -p1
You can also runscram build code-format
to apply code format directly
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38806/31165
-
This PR adds an extra 68KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File DataFormats/Provenance/interface/BranchDescription.h modified in PR(s): #38454
- File IOPool/Input/src/PoolSource.cc modified in PR(s): #38801
A new Pull Request was created by @makortel (Matti Kortelainen) for master.
It involves the following packages:
- DataFormats/Common (core)
- DataFormats/Provenance (core)
- DataFormats/TestObjects (core)
- FWCore/Framework (core)
- IOPool/Input (core)
@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks. @wddgit, @rovere this is something you requested to watch as well. @perrotta, @dpiparo, @qliphy, @rappoccio you are the release manager for this.
cms-bot commands are listed here
@cmsbuild, please test
@Dr15Jones @wddgit Can you think of any ill side effects that might not be covered by our unit tests?
I'll take a look at it.
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-384df2/26357/summary.html
COMMIT: fa0f54193995534e034258dbd6cda73ec3e69317
CMSSW: CMSSW_12_5_X_2022-07-20-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38806/26357/install.sh
to create a dev area with all the needed externals and cmssw changes.
CMS deprecated warnings: 9 CMS deprecated warnings found, see summary page for details.
Comparison Summary
Summary:
- No significant changes to the logs found
- Reco comparison results: 4 differences found in the comparisons
- DQMHistoTests: Total files compared: 50
- DQMHistoTests: Total histograms compared: 3662417
- DQMHistoTests: Total failures: 17
- DQMHistoTests: Total nulls: 1
- DQMHistoTests: Total successes: 3662377
- DQMHistoTests: Total skipped: 22
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: -0.004 KiB( 49 files compared)
- DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
- Checked 208 log files, 45 edm output root files, 50 DQM output files
- TriggerResults: no differences found
We already discussed that we keep BranchDescriptions for kept products and their ancestors. There is one other case where we also keep the BranchDescription. If there is a BranchAlias and we keep the alias BranchDescription, then we also keep the original BranchDescription. I'm not sure that this affects anything... Just mentioning it because we didn't include that when we were discussing it earlier.
Is there anything that inhibits callWhenNewProductsRegistered from being called on BranchDescriptions not present? Code like this would fail in that case.
https://cmssdt.cern.ch/dxr/CMSSW/source/RecoMuon/L3MuonProducer/src/L3TkMuonProducer.cc#38
This looks potentially problematic:
https://cmssdt.cern.ch/dxr/CMSSW/source/FWCore/Framework/interface/stream/ThinningProducer.h#163
That is all I noticed going through this quickly this afternoon. I could have missed something. There are a number of uses of callWhenNewProductsRegistered that might be problematic. (Now back to debugging the unit test that failed in the concurrent run PR)
Thanks @wddgit. I would think we could either restrict callWhenNewProductsRegistered()
callback to be called only for branches that are present, or go through all callWhenNewProductsRegistered()
callbacks and add a check for branch presence for cases where the type information is used. I would (naively) think he callbacks to not be very useful for non-present branches.
The ThinningProducer
looks potentially more fundamental problem, need to think about it more.
I tested that currently indeed callWhenNewProductsRegistered()
callbacks are called for non-present branches as well.
I went through all uses of callWhenNewProductsRegistered()
, here are notes from that
L3TkMuonProducer
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/RecoMuon/L3MuonProducer/src/L3TkMuonProducer.cc#L36-L42
AFAICT this is used only to get L3MuonTrajectorySeedCollection
from the Event in
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/RecoMuon/L3MuonProducer/src/L3TkMuonProducer.cc#L144-L151
I think this could be replaced with the use of RefProd<L3MuonTrajectorySeedCollection>
.
EcalDeadCellTriggerPrimitiveFilter
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/RecoMET/METFilters/plugins/EcalDeadCellTriggerPrimitiveFilter.cc#L188-L208
Here the behavior would certainly change if callWhenNewProductsRegistered()
would be called only for present branches. On a cursory look the code looks like it wants to get the products whose existence is tested in callWhenNewProductsRegistered()
though, so changing the (theoretical) behavior could be fine.
Framework test modules
Code like this
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/FWCore/Framework/test/stubs/TestGlobalAnalyzers.cc#L52-L56
in edmtest::global::StreamIntAnalyzer
, edmtest::global::StreamIntProducer
, edmtest::limited::StreamIntAnalyzer
, edmtest::one::SharedResourcesAnalyzer
, and edmtest::stream::GlobalIntAnalyzer
. I'd say these would be unaffected.
ManyIntWhenRegisteredProducer
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/FWCore/Framework/test/stubs/ToyIntProducers.cc#L516-L528 effectively assumes all relevant branches to exist
EventContentAnalyzer
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/FWCore/Modules/src/EventContentAnalyzer.cc#L299-L318
Here the callWhenNewProductsRegistered()
is used only conjunction of getData_
, i.e. it ignoring dropped branches should be fine.
GenericConsumer
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/FWCore/Modules/src/GenericConsumer.cc#L45-L57 etc. This code only declares consumption of the matched branches, so ignoring dropped branches should be fine.
On a separate note, this module does not currently support ProcessBlocks (it will throw an exception if the ProductRegistry has any ProcessBlock branches).
LogErrorHarvester
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/FWCore/Modules/src/LogErrorHarvester.cc#L65-L68
Explicitly limits to the current process, so ignoring dropped branches should be fine.
SwitchProducer
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/FWCore/Modules/src/SwitchProducer.cc#L27-L28
Explicitly limits to the current process, so ignoring dropped branches should be fine.
On a separate note I see now that there is another way to get the current process name there, so I could look into removing the @process_name
parameter of the module.
ConcurrentHadronizerFilter
, HadronizerFilter
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/GeneratorInterface/Core/interface/ConcurrentHadronizerFilter.h#L139-L149
The call is about consuming the matched branch, so ignoring dropped branches should be fine.
TriggerResultsFilter
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/HLTrigger/HLTfilters/plugins/TriggerResultsFilter.cc#L52-L57
The call is about consuming the matched branch, and non-existing products are ignored, so ignoring dropped branches should be fine.
L1GtUtilsHelper
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/L1Trigger/GlobalTriggerAnalyzer/interface/L1GtUtilsHelper.h#L177 https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/L1Trigger/GlobalTriggerAnalyzer/src/L1GtUtilsHelper.cc#L33-L38
Lot's of complex code, but the comment above suggests that the idea is to find a branch that matches to a specific condition, and then consume that branch. So ignoring dropped branches could be fine.
l1t::L1TGlobalUtilHelper
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/L1Trigger/L1TGlobal/interface/L1TGlobalUtilHelper.h#L158 https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/L1Trigger/L1TGlobal/src/L1TGlobalUtilHelper.cc#L51-L53
Explicitly ignores dropped branches.
Users of edm::GetterOfProducts
edm::GetterOfProducts
explicitly ignores dropped branches
https://github.com/cms-sw/cmssw/blob/1a59ba00193440d65339de1c600e01c15993e5de/FWCore/Framework/interface/GetterOfProducts.h#L130-L132
The following modules used only edm::GetterOfProducts
: DQMFileSaver
, DQMEDHarvester
, edmtest::TestFindProduct
, edmtest::TestGetterOfProducts
, TriggerSummaryProducerAOD
, TriggerSummaryProducerRAW
, edmtest_thing::StreamThingAnalyzer
, PATTriggerEventProducer
and PATTriggerProducer
.
The
ThinningProducer
looks potentially more fundamental problem, need to think about it more.
After some more thought and poking around the code, I think now that the ThinningProducer
is not a fundamental problem. The loop over ProductRegistry
's BranchDescription
elements contains the following conditionals
- If product's type matches the collection type
- If branch is produced and label matches the current module: record
BranchID
as a thinned collection - If label and instance names match those of the
ThinningProducer
'sinputTag
parameter- If
InputTag
should skip the current process, and if the branch is not produced, record it as a parent collection - Else if
InputTag
's process name is empty or the current process, record it as a parent collection
- If
- If branch is produced and label matches the current module: record
- If product's type matches
ThinnedAssociation
, branch is produced, label matches the current module: record the branch as an association
The thinned collection and association must be present (branches are produced, by the same module). The ThinningProducer
also expects the parent collection to be present, otherwise the module will throw a ProductNotFound
exception in its produce()
.
I think the loop over BranchDescription
s could just ignore all dropped branches.
I added a protection for dropped branches in ThinningProducers
in the last push. The callWhenNewProductsRegistered()
are handled in #38872.
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38806/31322
-
This PR adds an extra 76KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File DataFormats/Provenance/interface/BranchDescription.h modified in PR(s): #38454
- File IOPool/Input/src/PoolSource.cc modified in PR(s): #38801
Pull request #38806 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.
@cmsbuild, please test
It seems tests are stuck because of a failure in one of our nodes. I am going to restart them again. Sorry for the inconveniences!
please abort
please test
I think the failure in unit test is unrelated to this PR (https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-384df2/26522/unitTests/failed.html). It seems it is related to the failure in IBs in DQM/SiStripMonitorClient
module already reported in https://github.com/cms-sw/cmssw/issues/38885.
-1
Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-384df2/26522/summary.html
COMMIT: a618ee2a0b0b3d14982f7295f708401e4ad39eac
CMSSW: CMSSW_12_5_X_2022-07-28-2300/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/38806/26522/install.sh
to create a dev area with all the needed externals and cmssw changes.
CMS deprecated warnings: 9 CMS deprecated warnings found, see summary page for details.
Unit Tests
I found errors in the following unit tests:
---> test testSiStripDQM_OfflineTkMap had ERRORS
Comparison Summary
Summary:
- No significant changes to the logs found
- Reco comparison results: 10 differences found in the comparisons
- DQMHistoTests: Total files compared: 51
- DQMHistoTests: Total histograms compared: 3668050
- DQMHistoTests: Total failures: 29
- DQMHistoTests: Total nulls: 1
- DQMHistoTests: Total successes: 3667998
- DQMHistoTests: Total skipped: 22
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: -0.004 KiB( 50 files compared)
- DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
- Checked 210 log files, 47 edm output root files, 51 DQM output files
- TriggerResults: no differences found
4.53 shows the same difference as in #38603, which @Dr15Jones and I have understood to be a deficiency in L1GtUtilsHelper
and for which @Dr15Jones is preparing a fix.
The unit test fails already in IB.
@cmsbuild, please test
Refresh to see if the comparison differences are gone
-1
Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-384df2/26622/summary.html
COMMIT: a618ee2a0b0b3d14982f7295f708401e4ad39eac
CMSSW: CMSSW_12_5_X_2022-08-03-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38806/26622/install.sh
to create a dev area with all the needed externals and cmssw changes.
CMS deprecated warnings: 9 CMS deprecated warnings found, see summary page for details.
Unit Tests
I found errors in the following unit tests:
---> test EcalTPG_updateWeightGroup_test had ERRORS
Comparison Summary
Summary:
- No significant changes to the logs found
- Reco comparison results: 6 differences found in the comparisons
- DQMHistoTests: Total files compared: 51
- DQMHistoTests: Total histograms compared: 3691510
- DQMHistoTests: Total failures: 14
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3691474
- DQMHistoTests: Total skipped: 22
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
- Checked 212 log files, 49 edm output root files, 51 DQM output files
- TriggerResults: no differences found
The unit test fails randomly in IBs as well (#38964).
-code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38806/31435
-
This PR adds an extra 80KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File DataFormats/Provenance/interface/BranchDescription.h modified in PR(s): #38454
- File IOPool/Input/src/PoolSource.cc modified in PR(s): #38801
Code check has found code style and quality issues which could be resolved by applying following patch(s)
-
code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38806/31435/code-format.patch
e.g.
curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38806/31435/code-format.patch | patch -p1
You can also runscram build code-format
to apply code format directly
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38806/31436
-
This PR adds an extra 80KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File DataFormats/Provenance/interface/BranchDescription.h modified in PR(s): #38454
- File IOPool/Input/src/PoolSource.cc modified in PR(s): #38801
Pull request #38806 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.