cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

[Run3 PromptReco] 390 MB in cut/expression parser

Open makortel opened this issue 1 year ago • 16 comments

From https://github.com/cms-sw/cmssw/issues/46040#issuecomment-2420368633 the cut/expression parser uses 390 MB (constant cost, i.e. gets amortized across threads). The top 10 modules with the largest contributions are

  • 103 MB via ObjectSelectorBase<SingleElementCollectionSelector<edm::View<reco::Muon>, ...> (link)
  • 93 MB via ObjectSelectorBase<SingleElementCollectionSelector<std::vector<pat::GenericParticle, ...> (link)
  • 90 MB via BXVectorSimpleFlatTableProducer<l1t::EGamma> (link)
  • 26 MB via SimpleFlatTableProducerBase<reco::BeamSpot, reco::BeamSpot> (link)
  • 17 MB via TopSingleLeptonDQM (link
  • 12 MB via TopMonitor (link)
  • 11 MB via VersionedIdProducer<edm::Ptr<reco::Photon> (link)
  • 8 MB via RecoTauPiZeroProducer (link)
  • 5 MB via SimpleFlatTableProducer<pat::Jet> (link)
  • 5 MB via SimpleFlatTableProducer<pat::Electron> (link)

Addressing already the top 3 would save about 286 MB.

makortel avatar Oct 23 '24 15:10 makortel

cms-bot internal usage

cmsbuild avatar Oct 23 '24 15:10 cmsbuild

A new Issue was created by @makortel.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Oct 23 '24 15:10 cmsbuild

assign reconstruction, xpog, dqm

makortel avatar Oct 23 '24 15:10 makortel

New categories assigned: reconstruction,xpog,dqm

@antoniovagnerini,@ftorrresd,@hqucms,@jfernan2,@mandrenguyen,@nothingface0,@rvenditti,@syuvivida,@tjavaid you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Oct 23 '24 15:10 cmsbuild

are you rediscovering that the first piece of code calling to the boost::spirit brings most of the cost?

slava77 avatar Oct 23 '24 15:10 slava77

  • 103 MB via ObjectSelectorBase<SingleElementCollectionSelector<edm::View<reco::Muon>, ...> (link)

I believe this corresponds to MuonSelector https://github.com/cms-sw/cmssw/blob/4635bff7d6bee3aaee115ae16ec4c540ca76d7b0/CommonTools/RecoAlgos/plugins/MuonSelector.cc#L32-L35

The configuration defines 25 of them, of which 14 are not constructed in the C++ (i.e. these modules are not part of any Task/Sequence/Path/EndPath that is associated with the Schedule)

ALCARECOSiPixelCalSingleMuonTightGoodMuons
ALCARECOSiPixelCalSingleMuonTightRelCombIsoMuons
ALCARECOTkAlDiMuonGoodMuons
ALCARECOTkAlDiMuonRelCombIsoMuons
ALCARECOTkAlJpsiMuMuGoodMuons (not constructed)
ALCARECOTkAlMuonIsolatedGoodMuons
ALCARECOTkAlMuonIsolatedRelCombIsoMuons
ALCARECOTkAlUpsilonMuMuGoodMuons (not constructed)
ALCARECOTkAlUpsilonMuMuPAGoodMuons (not constructed)
ALCARECOTkAlUpsilonMuMuRelCombIsoMuons (not constructed)
ALCARECOTkAlZMuMuGoodMuons
ALCARECOTkAlZMuMuPAGoodMuons (not constructed)
ALCARECOTkAlZMuMuRelCombIsoMuons
ZmmgLeadingMuons (not constructed)
ZmmgTrailingMuons (not constructed)
gemDQMStaMuons
gemDQMTightGlbMuons
muonSelection (not constructed)
muonSelectorForEMu (not constructed)
muonSelectorForZMM (not constructed)
muonSelectorForZMMPA (not constructed)
muonsPt10
selectedMuons (not constructed)
selectedMuonsIso (not constructed)
tightMuonsForPbPbZMuSkim (not constructed)

Looking next the actual cut strings in the 11 modules that are constructed

ALCARECOSiPixelCalSingleMuonTightGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOSiPixelCalSingleMuonTightRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
ALCARECOTkAlDiMuonGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOTkAlDiMuonRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
ALCARECOTkAlMuonIsolatedGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOTkAlMuonIsolatedRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
ALCARECOTkAlZMuMuGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOTkAlZMuMuRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
gemDQMStaMuons isStandAloneMuon&& outerTrack.isNonnull
gemDQMTightGlbMuons isGlobalMuon&& globalTrack.isNonnull&& passed(\'CutBasedIdTight\')
muonsPt10 isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &pt > 10

There is a lot of repetition: one group of 4 modules and another group of 4 modules have exactly the same cut string.

Would it be feasible to replace these with modules that express the structure of the selection in C++ code instead?

makortel avatar Oct 23 '24 15:10 makortel

are you rediscovering that the first piece of code calling to the boost::spirit brings most of the cost?

Possibly. I had a vague recollection of such "constant cost" of this infrastructure, but it's source wasn't evident from the profile (maybe I just didn't dig deeply enough into Cling's call stack).

I'd guess the StringObjectFunction<l1t::EGamma, false> in BXVectorSimpleFlatTableProducer<l1t::EGamma> would be a good candidate for that "constant cost" as it takes 90 MB, while StringCutObjectSelector<l1t::EGamma, false> in the same module takes 448 B.

There is nevertheless potential for ~200 MB reduction by addressing the "next largest" consumers.

makortel avatar Oct 23 '24 15:10 makortel

assign alca

makortel avatar Oct 23 '24 15:10 makortel

New categories assigned: alca

@atpathak,@consuegs,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Oct 23 '24 15:10 cmsbuild

@cms-sw/alca-l2 By the way, of the 8 ALCARECO* MuonSelectors listed above,

  • ALCARECOSiPixelCalSingleMuonTightGoodMuons, ALCARECOTkAlDiMuonGoodMuons, ALCARECOTkAlMuonIsolatedGoodMuons, ALCARECOTkAlZMuMuGoodMuons have identical configuration
  • ALCARECOSiPixelCalSingleMuonTightRelCombIsoMuons, ALCARECOTkAlDiMuonRelCombIsoMuons, ALCARECOTkAlMuonIsolatedRelCombIsoMuons, ALCARECOTkAlZMuMuRelCombIsoMuons have otherwise the same configuration, except their src parameter refer to the aforementioned 4 modules (i.e. if they would replaced with a common module, these 4 could be replaced with a common module as well)

This duplication has both memory and runtime cost.

makortel avatar Oct 23 '24 16:10 makortel

  • 93 MB via ObjectSelectorBase<SingleElementCollectionSelector<std::vector<pat::GenericParticle, ...> (link)

I believe this one corresponds to PATGenericParticleSelector https://github.com/cms-sw/cmssw/blob/55fd97727507dd74a4020c9264f5f66118409151/PhysicsTools/PatAlgos/plugins/PATObjectSelector.cc#L249-L250

The configuration contains 2 such modules, of 1 is not constructed because it is not in any Task/Sequence/Path/EndPath associated to the Schedule

looseIsoMuonsForPbPbZMuSkim (not constructed)
looseIsoMuonsForZMuSkim

The cut string is same in both (userIsolation(\'pat::TrackIso\')/pt)<0.4 that looks straightforward to be potentially replaced with a C++-coded selection structure.

Although if this is impacted with the "first call to boost::spirit has a constant cost", it might not be that important (but looks simple nevertheless).

makortel avatar Oct 23 '24 16:10 makortel

assign DPGAnalysis/Skims

makortel avatar Oct 23 '24 16:10 makortel

type performance-improvements

makortel avatar Oct 23 '24 16:10 makortel

New categories assigned: pdmv

@AdrianoDee,@kskovpen,@miquork,@sunilUIET you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Oct 23 '24 16:10 cmsbuild

  • 90 MB via BXVectorSimpleFlatTableProducer<l1t::EGamma> (link)

This is SimpleTriggerL1EGFlatTableProducer https://github.com/cms-sw/cmssw/blob/55fd97727507dd74a4020c9264f5f66118409151/PhysicsTools/NanoAOD/plugins/SimpleL1TFlatTableProducerPlugins.cc#L4

that uses

    cut = cms.string('pt>=10'),
    ...
            expr = cms.string('eta'),
            ...
            expr = cms.string('hwIso()'),
            ...
            expr = cms.string('phi'),
            ...
            expr = cms.string('pt'),

that look like much simpler mapping from strings to member functions than full expression parser could be feasible.

makortel avatar Oct 23 '24 16:10 makortel

  • 26 MB via SimpleFlatTableProducerBase<reco::BeamSpot, reco::BeamSpot> (link)

I believe this is SimpleBeamspotFlatTableProducer https://github.com/cms-sw/cmssw/blob/55fd97727507dd74a4020c9264f5f66118409151/PhysicsTools/NanoAOD/plugins/SimpleFlatTableProducerPlugins.cc#L49

The configuration has 2 modules of this type, of which only 1 is actually used in the job

simpleBeamspotFlatTableProducer (not constructed)
beamSpotTable

The beamSpotTable accesses

            expr = cms.string('sigmaZ()'),
            ...
            expr = cms.string('sigmaZ0Error()'),
            ...
            expr = cms.string('type()'),
            ...
            expr = cms.string('position().z()'),
            ...
            expr = cms.string('z0Error()'),

that also looks like a much simpler mapping from strings to member functions than the full expression parser could be feasible.

makortel avatar Oct 23 '24 16:10 makortel

concerning https://github.com/cms-sw/cmssw/issues/46493#issuecomment-2432672540

Would it be feasible to replace these with modules that express the structure of the selection in C++ code instead?

[...]

This duplication has both memory and runtime cost.

I had a go at it at https://github.com/cms-sw/cmssw/pull/46574, comments are welcome.

mmusich avatar Oct 31 '24 21:10 mmusich