cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Fix location of alpaka modification for particleFlowClusterHCAL

Open jsamudio opened this issue 1 month ago • 31 comments

PR description:

Identified by @makortel, this change moves the modification of particleFlowClusterHCAL to the correct configuration file in accordance with the coding rules.

PR validation:

Tested the associated .42X matrix workflows.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Not a backport.

jsamudio avatar Nov 03 '25 15:11 jsamudio

cms-bot internal usage

cmsbuild avatar Nov 03 '25 15:11 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49295/46656

cmsbuild avatar Nov 03 '25 15:11 cmsbuild

A new Pull Request was created by @jsamudio for master.

It involves the following packages:

  • RecoParticleFlow/PFClusterProducer (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen, @srimanob can you please review it and eventually sign? Thanks. @felicepantaleo, @hatakeyamak, @lgray, @mmarionncern, @rovere, @sameasy, @seemasharmafnal this is something you requested to watch as well. @ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

cmsbuild avatar Nov 03 '25 15:11 cmsbuild

please test

jfernan2 avatar Nov 03 '25 17:11 jfernan2

+1

Size: This PR adds an extra 20KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0d0802/49212/summary.html COMMIT: 34edc03be36294e1a9f81212857b01314d79e98d CMSSW: CMSSW_16_0_X_2025-11-03-1100/el8_amd64_gcc13 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49295/49212/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3939953
  • DQMHistoTests: Total failures: 50
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3939883
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 218 log files, 188 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

cmsbuild avatar Nov 03 '25 19:11 cmsbuild

hold

Just for me to have a moment to digest it

makortel avatar Nov 03 '25 19:11 makortel

Pull request has been put on hold by @makortel They need to issue an unhold command to remove the hold state or L1 can unhold it for all

cmsbuild avatar Nov 03 '25 19:11 cmsbuild

Thanks @jsamudio. This PR is in the right direction, but I'm afraid it is not sufficient to fully resolve the policy violation.

Customization of https://github.com/cms-sw/cmssw/blob/34edc03be36294e1a9f81212857b01314d79e98d/RecoParticleFlow/PFClusterProducer/python/pfClusterHBHEAlpaka_cff.py#L82 should be done where pfClusteringHBHEHFTask is defined, i.e. in RecoParticleFlow/PFClusterProducer/python/particleFlowCluster_cff.py. Same for https://github.com/cms-sw/cmssw/blob/34edc03be36294e1a9f81212857b01314d79e98d/RecoParticleFlow/PFClusterProducer/python/pfClusterHBHEAlpaka_cff.py#L128

In addition, the legacyPFRecHitProducer and legacyPFClusterProducer would be better dealt such that the corresponding legacy producer is alpaka.toReplaceWith() with the LegacyPFRecHitProducer and LegacyPFClusterProducer that convert the SoAs to legacy data products.

If I read a configuration dump correctly, those would be https://github.com/cms-sw/cmssw/blob/34edc03be36294e1a9f81212857b01314d79e98d/RecoParticleFlow/PFClusterProducer/python/particleFlowRecHitHBHE_cfi.py#L10 and https://github.com/cms-sw/cmssw/blob/34edc03be36294e1a9f81212857b01314d79e98d/RecoParticleFlow/PFClusterProducer/python/particleFlowClusterHBHE_cfi.py#L19-L20

Actually with those changes (i.e. customizing the EDProducer type a given module label corresponds to instead of the downstream modules for their input these customizations https://github.com/cms-sw/cmssw/blob/34edc03be36294e1a9f81212857b01314d79e98d/RecoParticleFlow/PFClusterProducer/python/particleFlowClusterHCAL_cfi.py#L75-L79 wouldn't be needed, because the default value for the clustersSource would already correspond to the Alpaka-based chain of modules.

makortel avatar Nov 03 '25 21:11 makortel

@makortel Thanks for clarifying, I'll work on fixing it.

jsamudio avatar Nov 03 '25 21:11 jsamudio

@makortel Regarding the second half of your comment, I think I remember why I did not use, for example,

alpaka.toReplaceWith(particleFlowClusterHBHEOnly, legacyPFClusterProducerHBHEOnly)

For the .423 workflow, the GPU vs legacy comparison is included as: https://github.com/cms-sw/cmssw/blob/1cf33b6cf0c5a245f6d57205b888e5f6185c1064/Validation/RecoParticleFlow/python/pfCaloGPUComparisonTask_cfi.py#L4-L8

Which particleFlowClusterHBHEOnly is coming from the reconstruction_hcalOnlyLegacy sequence. I am not sure if having the same module labels is problematic in this case or if I can specify which sequence they are coming from? Do you have any advice?

jsamudio avatar Nov 06 '25 16:11 jsamudio

Which particleFlowClusterHBHEOnly is coming from the reconstruction_hcalOnlyLegacy sequence. I am not sure if having the same module labels is problematic in this case or if I can specify which sequence they are coming from? Do you have any advice?

What happens is that there is exactly one module with particleFlowClusterHBHEOnly label (regardless which Sequences/Tasks contain the module). I.e. the alpaka.toReplaceWith(particleFlowClusterHBHEOnly, legacyPFClusterProducerHBHEOnly) is a global customization.

Our established pattern for "Alpaka vs legacy" comparisons is to define another module label for the legacy chain (or DAG) of modules.

One part in my confusion in the case of pfClusterHBHEOnlyAlpakaComparison was that by just looking the consumed module labels, I thought legacyPFClusterProducerHBHEOnly referred to the chain of legacy modules, rather than the legacy-from-SoA chain.

For an example see e.g. https://github.com/cms-sw/cmssw/blob/master/RecoLocalCalo/EcalRecProducers/python/ecalMultiFitUncalibRecHit_cff.py that defines the legacy-from-SoA-converter like https://github.com/cms-sw/cmssw/blob/1cf33b6cf0c5a245f6d57205b888e5f6185c1064/RecoLocalCalo/EcalRecProducers/python/ecalMultiFitUncalibRecHit_cff.py#L31 and then the full legacy chain like https://github.com/cms-sw/cmssw/blob/1cf33b6cf0c5a245f6d57205b888e5f6185c1064/RecoLocalCalo/EcalRecProducers/python/ecalMultiFitUncalibRecHit_cff.py#L6 https://github.com/cms-sw/cmssw/blob/1cf33b6cf0c5a245f6d57205b888e5f6185c1064/RecoLocalCalo/EcalRecProducers/python/ecalMultiFitUncalibRecHit_cff.py#L42-L46

makortel avatar Nov 06 '25 19:11 makortel

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49295/46851

  • There are other open Pull requests which might conflict with changes you have proposed:
    • File Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py modified in PR(s): #49339, #49383
    • File RecoLocalCalo/Configuration/python/hcalGlobalReco_cff.py modified in PR(s): #46076
    • File RecoLocalCalo/Configuration/python/hcalLocalReco_cff.py modified in PR(s): #45995, #46076

cmsbuild avatar Nov 17 '25 19:11 cmsbuild

Pull request #49295 was updated. @AdrianoDee, @DickyChant, @antoniovagnerini, @cmsbuild, @ctarricone, @gabrielmscampos, @jfernan2, @mandrenguyen, @miquork, @nothingface0, @rseidita, @srimanob can you please check and sign again.

cmsbuild avatar Nov 17 '25 19:11 cmsbuild

@makortel Sorry for the long delay. I refactored everything such that the original pfClusterHBHEAlpaka_cff.py is not necessary, and moved the customizations from Alpaka to (I think) the right locations. If there is still an issue, please let me know.

jsamudio avatar Nov 17 '25 19:11 jsamudio

please test

jfernan2 avatar Nov 18 '25 08:11 jfernan2

-1

Failed Tests: RelVals RelVals-INPUT AddOn Size: This PR adds an extra 52KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0d0802/49515/summary.html COMMIT: 40354ceb0896103b5428e883378068e0d8a6e584 CMSSW: CMSSW_16_0_X_2025-11-17-2300/el8_amd64_gcc13 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49295/49515/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals

----- Begin Fatal Exception 18-Nov-2025 09:25:58 CET-----------------------
An exception of category 'PluginNotFound' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Unable to find plugin 'PFRecHitHCALParamsESProducer@alpaka' in category 'CMS EDM Framework ESModule'. Please check spelling of name.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 18-Nov-2025 09:25:58 CET-----------------------
An exception of category 'PluginNotFound' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Unable to find plugin 'PFRecHitHCALParamsESProducer@alpaka' in category 'CMS EDM Framework ESModule'. Please check spelling of name.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 18-Nov-2025 09:38:56 CET-----------------------
An exception of category 'PluginNotFound' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Unable to find plugin 'PFRecHitHCALParamsESProducer@alpaka' in category 'CMS EDM Framework ESModule'. Please check spelling of name.
----- End Fatal Exception -------------------------------------------------
Expand to see more relval errors ...

Failed RelVals-INPUT

  • 139.005139.005_AlCaPhiSym2021/step2_AlCaPhiSym2021.log

Failed AddOn Tests

----- Begin Fatal Exception 18-Nov-2025 09:23:57 CET-----------------------
An exception of category 'PluginNotFound' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Unable to find plugin 'PFRecHitHCALParamsESProducer@alpaka' in category 'CMS EDM Framework ESModule'. Please check spelling of name.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 18-Nov-2025 09:23:57 CET-----------------------
An exception of category 'PluginNotFound' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Unable to find plugin 'PFRecHitHCALParamsESProducer@alpaka' in category 'CMS EDM Framework ESModule'. Please check spelling of name.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 18-Nov-2025 09:23:57 CET-----------------------
An exception of category 'PluginNotFound' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Unable to find plugin 'PFRecHitHCALParamsESProducer@alpaka' in category 'CMS EDM Framework ESModule'. Please check spelling of name.
----- End Fatal Exception -------------------------------------------------
Expand to see more addon errors ...

cmsbuild avatar Nov 18 '25 09:11 cmsbuild

These errors

Unable to find plugin 'PFRecHitHCALParamsESProducer@alpaka' in category 'CMS EDM Framework ESModule'. Please check spelling of name.

could be fixed by adding all the Alpaka-based ESProducers in one labeled Task, and then adding that Task into the _alpaka_pfClusteringHBHEHFTask. Then the framework always sees these ESProducers in some Task and won't enable them by default (this behavior is described in https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideAboutPythonConfigFile#Behavior_when_an_ESProducer_ESSo)

See e.g. https://github.com/cms-sw/cmssw/blob/318ca4d1abe3a7df429290937e95bf32dfac84c8/EventFilter/EcalRawToDigi/python/ecalDigis_cff.py#L20-L22 and https://github.com/cms-sw/cmssw/pull/48703 for examples

makortel avatar Nov 18 '25 22:11 makortel

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49295/47012

cmsbuild avatar Nov 28 '25 19:11 cmsbuild

Pull request #49295 was updated. @cmsbuild, @ctarricone, @gabrielmscampos, @jfernan2, @mandrenguyen, @nothingface0, @rseidita, @srimanob can you please check and sign again.

cmsbuild avatar Nov 28 '25 19:11 cmsbuild

I've updated the branch to both address the earlier comments and change the style of the changes. So I keep both a full legacy sequence, and apply the alpaka modifier onto the normal sequence. This is instead of doing both in the same sequence, which was the last iteration.

jsamudio avatar Nov 28 '25 19:11 jsamudio

@cmsbuild, please test

makortel avatar Dec 01 '25 21:12 makortel

-1

Failed Tests: UnitTests AddOn Size: This PR adds an extra 16KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0d0802/49743/summary.html COMMIT: 9d86caef027b62e141f9bb53dbc9e078cd4b650b CMSSW: CMSSW_16_0_X_2025-12-01-1100/el8_amd64_gcc13 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49295/49743/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed Unit Tests

I found 1 errors in the following unit tests:

---> test TestDQMGUIUpload had ERRORS

Failed AddOn Tests

----- Begin Fatal Exception 01-Dec-2025 22:58:05 CET-----------------------
An exception of category 'EventSetupConflict' occurred while
   [0] Calling beginJob
Exception Message:
two EventSetup Producers want to deliver type="reco::PFRecHitHCALParamsHostCollection" label=""
 from record PFRecHitHCALParamsRecord. The two providers are 
1) type="PFRecHitHCALParamsESProducer@alpaka" label="hltESPPFRecHitHCALParams"
2) type="PFRecHitHCALParamsESProducer@alpaka" label="pfRecHitHCALParamsESProducer"
Please either
   remove one of these Producers
   or find a way of configuring one of them so it does not deliver this data
   or use an es_prefer statement in the configuration to choose one.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 01-Dec-2025 22:57:59 CET-----------------------
An exception of category 'EventSetupConflict' occurred while
   [0] Calling beginJob
Exception Message:
two EventSetup Producers want to deliver type="reco::PFRecHitHCALParamsHostCollection" label=""
 from record PFRecHitHCALParamsRecord. The two providers are 
1) type="PFRecHitHCALParamsESProducer@alpaka" label="hltESPPFRecHitHCALParams"
2) type="PFRecHitHCALParamsESProducer@alpaka" label="pfRecHitHCALParamsESProducer"
Please either
   remove one of these Producers
   or find a way of configuring one of them so it does not deliver this data
   or use an es_prefer statement in the configuration to choose one.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 01-Dec-2025 22:57:27 CET-----------------------
An exception of category 'EventSetupConflict' occurred while
   [0] Calling beginJob
Exception Message:
two EventSetup Producers want to deliver type="reco::PFRecHitHCALParamsHostCollection" label=""
 from record PFRecHitHCALParamsRecord. The two providers are 
1) type="PFRecHitHCALParamsESProducer@alpaka" label="hltESPPFRecHitHCALParams"
2) type="PFRecHitHCALParamsESProducer@alpaka" label="pfRecHitHCALParamsESProducer"
Please either
   remove one of these Producers
   or find a way of configuring one of them so it does not deliver this data
   or use an es_prefer statement in the configuration to choose one.
----- End Fatal Exception -------------------------------------------------
Expand to see more addon errors ...

Comparison Summary

Summary:

  • You potentially added 7 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 8 differences found in the comparisons
  • Reco comparison had 4 failed jobs
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4269233
  • DQMHistoTests: Total failures: 80
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4269133
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 198 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

cmsbuild avatar Dec 01 '25 23:12 cmsbuild

I'm puzzled about the failure. This PR defines the pfRecHitHCALParamsESProducer in https://github.com/cms-sw/cmssw/blob/9d86caef027b62e141f9bb53dbc9e078cd4b650b/RecoParticleFlow/PFClusterProducer/python/particleFlowCluster_cff.py#L122 then inserts it into a Task https://github.com/cms-sw/cmssw/blob/9d86caef027b62e141f9bb53dbc9e078cd4b650b/RecoParticleFlow/PFClusterProducer/python/particleFlowCluster_cff.py#L130-L134 that is included in two other Tasks https://github.com/cms-sw/cmssw/blob/9d86caef027b62e141f9bb53dbc9e078cd4b650b/RecoParticleFlow/PFClusterProducer/python/particleFlowCluster_cff.py#L158 https://github.com/cms-sw/cmssw/blob/9d86caef027b62e141f9bb53dbc9e078cd4b650b/RecoParticleFlow/PFClusterProducer/python/particleFlowCluster_cff.py#L177 that are "enabled" with the alpaka modifier in https://github.com/cms-sw/cmssw/blob/9d86caef027b62e141f9bb53dbc9e078cd4b650b/RecoParticleFlow/PFClusterProducer/python/particleFlowCluster_cff.py#L181-L182

This should result the framework to not use the pfRecHitHCALParamsESProducer (because it is put in a Task), unless alpaka modifier is enabled (it should be disabled in these tests).

makortel avatar Dec 04 '25 15:12 makortel

Looking at the edmConfigDump shows that the pfRecHitHCALParamsESProducer would not be in any Task.

Investigating RelVal_HLT_RAW2DIGI_L1Reco_RECO.py directly, it has lines

from HLTrigger.Configuration.CustomConfigs import L1THLT,HLTRECO
process = HLTRECO(process)

that makes the process.alpaka_pfClusteringHBHEHF_esProducersTask empty.

makortel avatar Dec 04 '25 15:12 makortel

The HLTRECO indeed removes all ESModules from Tasks :( https://github.com/cms-sw/cmssw/blob/61a1113117cf3a7aafee41dee39a175dbda615f4/HLTrigger/Configuration/python/CustomConfigs.py#L107-L132

makortel avatar Dec 04 '25 16:12 makortel

The reason why this pattern of placing Alpaka ESProducers in Tasks works e.g. in ECAL is that the HLT menu uses the same label (e.g. ecalElectronicsMappingHostESProducer) in the reconstruction configuration and the HLT menu.

I suppose that pattern should be followed with PF too. That would mean the HLT menu should be updated to use

  • pfRecHitHCALParamsRecordSource instead of hltESSPFRecHitHCALParamsRecord
  • pfRecHitHCALTopologyRecordSource instead of hltESSPFRecHitHCALTopologyRecord
  • pfRecHitHCALParamsESProducer instead of hltESPPFRecHitHCALParams
  • pfRecHitHCALTopologyESProducer instead of hltESPPFRecHitHCALTopology

makortel avatar Dec 04 '25 16:12 makortel

I suppose that pattern should be followed with PF too. That would mean the HLT menu should be updated to use

Using the same label for offline and HLT is in general heavily discourage. It is fine only if it can be guaranteed that, under no circumstanced whatsoever, the content of the offline and HLT configuration will never need to be different. Is this the case here?

mmusich avatar Dec 08 '25 08:12 mmusich

It is fine only if it can be guaranteed that, under no circumstanced whatsoever, the content of the offline and HLT configuration will never need to be different. Is this the case here?

@jsamudio @hatakeyamak Could you comment?

If that is not the case, an alternative would be to specify different labels (of which one could be empty string) for the ESProducts for HLT and offline cases.

makortel avatar Dec 09 '25 20:12 makortel

I think we can say we have no plan to make them differ between HLT and offline.

hatakeyamak avatar Dec 09 '25 21:12 hatakeyamak

at least for these parameters.

That's not enough, they would need to be integrally identical in the full list of parameters, otherwise in HLT+RECO jobs one of the two would get overridden.

mmusich avatar Dec 09 '25 21:12 mmusich