cmssw CMSSW Integration of LST

This PR integrates the LST algorithm in CMSSW. A summary of the algorithm and its scope can be found in the recent LST presentation at the Phase 2 Software days (April 2024).

The PR includes the following additions/modifications:

New package w/ the LST algorithm code (RecoTracker/LSTCore):
- interface/alpaka: The interface exposed to CMSSW.
- src/alpaka: The actual LST code.
- standalone: Scripts to be used for compiling, using & testing LST outside of the full CMSSW framework. ⇒ Not relevant for CMSSW review.
- Minimal/at most header-only dependency of LSTCore on other CMSSW packages ⇒ Preserve ability to run in standalone.
New package w/ CMSSW modules related to LST (RecoTracker/LST):
- interface: The input & output data formats for LST.
- plugins: The producers:
  - Converting to/from the LST data formats (ED).
  - Loading the LST custom geometry files (ES).
  - Running LST to produce CMSSW collections (ED).
- python: The configuration files needed for running LST.
- src: Class definitions and ES producer supporting files.
- ~test: Scripts for local testing~ → Dropped in favor of a proper workflow.
New process modifiers to test LST (changes in multiple existing packages):
- trackingIters01: Runs only the first two iterations of tracking (initialStep & highPtTripletStep). Useful for comparisons, as LST (for now) replaces only those two tracking iterations.
- trackingLST: Runs the LST algorithm instead of KalmanFilter for track building/seeding. ~The existence of the gpu process modifier defines the hardware the algorithm runs on (CPU or GPU).~

There is a single change not strictly related to the above categories and a dedicated comment will be made on it.

In general, we prefer to have minimal or at most header-only dependency of LSTCore on other CMSSW packages to preserve the ability to run with standalone scripts.

This is a large PR, so we start it as an RFC with the main batch of files. In the next days, the following updates are to be expected, so that the PR can be merged:

[x] Removal of test scripts and introduction of workflow.
[x] Extraction of the LST data files from the proper directories ~(bot tests will probably not work currently)~.
[x] Modifications to the standalone scripts → Not to be reviewed.

Goes together with cms-data/RecoTracker-LSTCore#1 (now merged).

@slava77 @ariostas

Jun 01 '24 14:06 VourMa

cms-bot internal usage

Jun 01 '24 14:06 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45117/40456

This PR adds an extra 788KB to repository

Jun 01 '24 14:06 cmsbuild

A new Pull Request was created by @VourMa for master.

It involves the following packages:

Configuration/ProcessModifiers (operations)
RecoTracker/ConversionSeedGenerators (reconstruction)
RecoTracker/FinalTrackSelectors (reconstruction)
RecoTracker/IterativeTracking (reconstruction)
RecoTracker/LST (****)
RecoTracker/LSTCore (reconstruction)

The following packages do not have a category, yet:

RecoTracker/LST Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbuild, @rappoccio, @jfernan2, @davidlange6, @mandrenguyen, @fabiocos, @antoniovilela can you please review it and eventually sign? Thanks. @VourMa, @missirol, @gpetruc, @rovere, @GiacomoSguazzoni, @VinInn, @Martin-Grunewald, @mmusich, @mtosi, @dgulhan, @JanFSchulte, @fabiocos, @felicepantaleo, @makortel this is something you requested to watch as well. @rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

Jun 01 '24 14:06 cmsbuild

test parameters:

pull_request = cms-data/RecoTracker-LSTCore#1

Jun 01 '24 15:06 slava77

@cmsbuild please test

Jun 01 '24 15:06 slava77

-1

Failed Tests: UnitTests Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/summary.html COMMIT: 891eb1192817a5479f1135ba3741f80edee46df7 CMSSW: CMSSW_14_1_X_2024-06-01-1100/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45117/39661/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 2 errors in the following unit tests:

---> test TestDQMOnlineClient-hlt_dqm_sourceclient had ERRORS
---> test testTrackingResolution had ERRORS

Comparison Summary

Summary:

You potentially added 16 lines to the logs
Reco comparison results: 10 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3445370
DQMHistoTests: Total failures: 6
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3445344
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
Checked 206 log files, 170 edm output root files, 49 DQM output files
TriggerResults: no differences found

Jun 01 '24 19:06 cmsbuild

I found 2 errors in the following unit tests:

both are apparently related to LST https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/unitTests/failed.html

Jun 02 '24 01:06 slava77

I found 2 errors in the following unit tests:

both are apparently related to LST https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/unitTests/failed.html

An exception of category 'PluginNotFound' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Unable to find plugin 'LSTModulesDevESProducer@alpaka' in category 'CMS EDM Framework ESModule'. Please check spelling of name.

it's not obvious how this dependency comes about from looking at https://github.com/cms-sw/cmssw/blob/master/DQM/TrackingMonitorSource/test/testTrackResolution_cfg.py (a Run3 test)

@makortel do you see a clear way how the LST ES dependency makes it through here?

Jun 03 '24 00:06 slava77

assign heterogeneous

Jun 03 '24 13:06 mmusich

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

Jun 03 '24 13:06 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45117/40462

This PR adds an extra 788KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py modified in PR(s): #33532, #43761, #43964, #44785, #44910, #45005

Jun 03 '24 17:06 cmsbuild

test parameters:

pull_request = cms-data/RecoTracker-LSTCore#1
workflows = 24834.703
workflows_gpu = 24834.704

Jun 03 '24 17:06 slava77

Pull request #45117 was updated. @AdrianoDee, @fabiocos, @sunilUIET, @jfernan2, @makortel, @fwyzard, @mandrenguyen, @subirsarkar, @miquork, @davidlange6, @srimanob, @rappoccio, @cmsbuild, @antoniovilela can you please check and sign again.

Jun 03 '24 17:06 cmsbuild

@cmsbuild please test

Jun 03 '24 17:06 slava77

about test parameters: I'm not sure the last one is correct with workflows_gpu = 24834.704. There is just one new GPU workflow needed in these tests; do I need to enable gpu as well?

Jun 03 '24 17:06 slava77

do you see a clear way how the LST ES dependency makes it through here?

Maybe edmConfigTree could be helpful?

Jun 03 '24 19:06 makortel

-1

Failed Tests: UnitTests RelVals Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39677/summary.html COMMIT: dc4e61f6918237954d7a89c215a0ac877a374807 CMSSW: CMSSW_14_1_X_2024-06-03-1100/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45117/39677/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

@wddgit cms-sw/cmssw#45017
@cgsavard cms-sw/cmssw#45011
@gpetruc cms-sw/cmssw#44963
@AdrianoDee cms-sw/cmssw#44965
@iarspider cms-sw/cmssw#45076
@Pruthvi-ch cms-sw/cmssw#45066
@Dr15Jones cms-sw/cmssw#45094
@lowette cms-sw/cmssw#44992
@iarspider cms-sw/cmssw#45096
@iarspider cms-sw/cmssw#45078
@iarspider cms-sw/cmsdist#9222

You can see more details here: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39677/git-recent-commits.json https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39677/git-merge-result

Unit Tests

I found 24 errors in the following unit tests:

---> test ApeTest had ERRORS
---> test testAlignmentStats had ERRORS
---> test test_MilleZmm had ERRORS
and more ...

RelVals

24834.70324834.703_TTbar_14TeV+2026D98_lstOnCPUIters01TrackingOnly/step3_TTbar_14TeV+2026D98_lstOnCPUIters01TrackingOnly.log

Jun 04 '24 03:06 cmsbuild

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

@smuzaffar do I take it that there is still no way to disable using the head of cms-sw/master branch in PRs depending on cms-data? or is there a way to ask in test parameters?

Jun 04 '24 15:06 slava77

I had to force-push to fix some issues we were having internally, but after this we will have a consistent commit history.

Jun 05 '24 18:06 ariostas

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45117/40488

This PR adds an extra 36KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py modified in PR(s): #33532, #43761, #43964, #44785, #44910, #45005

Jun 05 '24 19:06 cmsbuild

Pull request #45117 was updated. @AdrianoDee, @rappoccio, @jfernan2, @antoniovilela, @subirsarkar, @davidlange6, @miquork, @mandrenguyen, @sunilUIET, @cmsbuild, @fabiocos, @fwyzard, @makortel, @srimanob can you please check and sign again.

Jun 05 '24 19:06 cmsbuild

do you see a clear way how the LST ES dependency makes it through here?

Maybe edmConfigTree could be helpful?

@makortel

it looks like an ESProducer with @alpaka will not be found if the alpaka service is not loaded. I see that in other places makeProcessModifier is used in connection with a modifier that's supposed to also trigger loading of the alpaka service. Also, the framework apparently constructs ES modules even if their data is not consumed.

Notably in the standard include StandardSequences.Services_cff we have

(gpu | pixelNtupletFit | alpaka).makeProcessModifier(_addProcessAccelerators)

For LST we'll just follow other examples and load the ES (and why not, all other files) with a makeProcessModifier.

Jun 05 '24 21:06 slava77

it looks like an ESProducer with @alpaka will not be found if the alpaka service is not loaded.

The missing piece is ProcessAcceleratorAlpaka (i.e. process.load("HeterogeneousCore.AlpakaCore.ProcessAcceleratorAlpaka_cfi") or equivalent) to be exact. (AlpakaService has nothing to do with this part).

Also, the framework apparently constructs ES modules even if their data is not consumed.

This is true for all modules (i.e. both ED and ES). Framework knows the data dependencies only after all modules have been constructed.

Notably in the standard include StandardSequences.Services_cff we have
(gpu | pixelNtupletFit | alpaka).makeProcessModifier(_addProcessAccelerators)

The point there is that if a GPU is desired, the Configuration.StandardSequences.Accelerators_cff gets loaded (that then sets up the necessary ProcessAccelerators and Services). I.e. a job loading either Configuration.StandardSequences.Services_cff or Configuration.StandardSequences.Accelerators_cff doesn't have to do anything.

I believe eventually the loading of Accelerators_cff in Services_cff (or a direct load() in cmsDriver.py-generated configurations) would become the default behavior.

For LST we'll just follow other examples and load the ES (and why not, all other files) with a makeProcessModifier.

I'd suggest to try first if including the Alpaka-based ESProducer in cms.Tasks (among the EDModules) and modifying the Task content with modifiers would be feasible (if an ESProducer is in any Task the process sees, the ESProducer is constructed only if it is in a Task associated with the Schedule directly or via any Path, more details in SWGuideAboutPythonConfigFile). IIRC this approach has been used successfully in other Alpaka PRs. The makeProcessModifier is quite "heavy" operation, and should be used only if nothing else works. (on the other hand, if you want to go towards an organization where e.g. two different cffs are loaded depending on Modifiers, then makeProcessModifier would be the way to go)

Jun 05 '24 22:06 makortel

I'd suggest to try first if including the Alpaka-based ESProducer in cms.Tasks (among the EDModules) and modifying the Task content with modifiers would be feasible (if an ESProducer is in any Task the process sees, the ESProducer is constructed only if it is in a Task associated with the Schedule directly or via any Path

that's what we have already in the not working case: the lstModulesDevESProducer is present only in _HighPtTripletStepTask_LST which becomes real after (trackingPhase2PU140 & trackingLST).toReplaceWith(HighPtTripletStepTask, _HighPtTripletStepTask_LST)

Yet, the Run3 unit tests somehow fail (while the run3 matrix workflows do not)

Jun 05 '24 23:06 slava77

IIRC this approach has been used successfully in other Alpaka PRs.

there are some examples both ways

makeProcessModifier https://github.com/cms-sw/cmssw/blob/ebe68aa956253d0403521939367e6d4552798552/RecoLocalTracker/SiPixelRecHits/python/PixelCPEESProducers_cff.py#L23
or firectly importing Accelerators_cff https://github.com/cms-sw/cmssw/blob/ebe68aa956253d0403521939367e6d4552798552/EventFilter/EcalRawToDigi/python/ecalDigis_cff.py#L17C38-L17C54

Should we add Accelerators_cff if the makeProcessModifier is expensive?

Jun 05 '24 23:06 slava77

or firectly importing Accelerators_cff https://github.com/cms-sw/cmssw/blob/ebe68aa956253d0403521939367e6d4552798552/EventFilter/EcalRawToDigi/python/ecalDigis_cff.py#L17C38-L17C54

ah, that's why the matrix workflows do not fail: most of them include ecalDigis_cff eventually

Jun 05 '24 23:06 slava77

that's what we have already in the not working case: the lstModulesDevESProducer is present only in _HighPtTripletStepTask_LST

I guess the problem is the lstModulesDevESProducer being in a Task whose name starts with an underscore. Those are "module private" in python, and do not automatically propagate through import or process.load(). If you would change the name of that Task to not have the leading underscore, I'd expect the framework to not construct lstModulesDevESProducer when trackingPhase2PU140 & trackingLST is inactive.

Jun 06 '24 00:06 makortel

makeProcessModifier https://github.com/cms-sw/cmssw/blob/ebe68aa956253d0403521939367e6d4552798552/RecoLocalTracker/SiPixelRecHits/python/PixelCPEESProducers_cff.py#L23

On a cursory look this could be ok because the conditional action is to process.load() other python files. (but didn't look further if things could be done better)

or firectly importing Accelerators_cff https://github.com/cms-sw/cmssw/blob/ebe68aa956253d0403521939367e6d4552798552/EventFilter/EcalRawToDigi/python/ecalDigis_cff.py#L17C38-L17C54

While from the framework perspective having each cff to import everything it depends on would actually be preferable, this is not how cmsDriver-based configuration infrastructure has been crafted. The Accelerators_cff should be treated similarly to Services_cff, and the approach seems to be that all cfi/cff fragments assume the cfg loads the Services_cff (and thus Accelerators_cff) by other means. From that perspective the from Accelerators_cff import * in ecalDigis_cff "slipped through" and should be cleaned up. Or, I don't immediately see why relying on the cfg in some way loading the Accelerators_cff would not work (and I'm not able to look into details before July).

Jun 06 '24 00:06 makortel

it sounds like makeProcessModifier is not quite desirable and Accelerators_cff is not right.

I could think of another pattern (partly meta-code)

from RecoTracker.LST.lstModulesDevESProducer_cfi import lstModulesDevESProducer as _lstES
lstES = cms.ESProducer('Placeholder')
lstModifier.toReplaceWith(lstES, _lstES)

what would be a good Placeholder ? I guess it should be something without actual ES product registration so that this pattern can be reused in many places.

Jun 06 '24 02:06 slava77

I'm not able to look into details before July

thank you for your time while not available to look at details. I hope other reviewers will be available to help.

Jun 06 '24 02:06 slava77