CMSSW Integration of LST
This PR integrates the LST algorithm in CMSSW. A summary of the algorithm and its scope can be found in the recent LST presentation at the Phase 2 Software days (April 2024).
The PR includes the following additions/modifications:
- New package w/ the LST algorithm code (RecoTracker/LSTCore):
- interface/alpaka: The interface exposed to CMSSW.
- src/alpaka: The actual LST code.
- standalone: Scripts to be used for compiling, using & testing LST outside of the full CMSSW framework. ⇒ Not relevant for CMSSW review.
- Minimal/at most header-only dependency of LSTCore on other CMSSW packages ⇒ Preserve ability to run in standalone.
- New package w/ CMSSW modules related to LST (RecoTracker/LST):
- interface: The input & output data formats for LST.
- plugins:
The producers:
- Converting to/from the LST data formats (ED).
- Loading the LST custom geometry files (ES).
- Running LST to produce CMSSW collections (ED).
- python: The configuration files needed for running LST.
- src: Class definitions and ES producer supporting files.
- ~test: Scripts for local testing~ → Dropped in favor of a proper workflow.
- New process modifiers to test LST (changes in multiple existing packages):
- trackingIters01: Runs only the first two iterations of tracking (initialStep & highPtTripletStep). Useful for comparisons, as LST (for now) replaces only those two tracking iterations.
- trackingLST: Runs the LST algorithm instead of KalmanFilter for track building/seeding. ~The existence of the gpu process modifier defines the hardware the algorithm runs on (CPU or GPU).~
There is a single change not strictly related to the above categories and a dedicated comment will be made on it.
In general, we prefer to have minimal or at most header-only dependency of LSTCore on other CMSSW packages to preserve the ability to run with standalone scripts.
This is a large PR, so we start it as an RFC with the main batch of files. In the next days, the following updates are to be expected, so that the PR can be merged:
- [x] Removal of test scripts and introduction of workflow.
- [x] Extraction of the LST data files from the proper directories ~(bot tests will probably not work currently)~.
- [x] Modifications to the standalone scripts → Not to be reviewed.
Goes together with cms-data/RecoTracker-LSTCore#1 (now merged).
@slava77 @ariostas
cms-bot internal usage
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45117/40456
- This PR adds an extra 788KB to repository
A new Pull Request was created by @VourMa for master.
It involves the following packages:
- Configuration/ProcessModifiers (operations)
- RecoTracker/ConversionSeedGenerators (reconstruction)
- RecoTracker/FinalTrackSelectors (reconstruction)
- RecoTracker/IterativeTracking (reconstruction)
- RecoTracker/LST (****)
- RecoTracker/LSTCore (reconstruction)
The following packages do not have a category, yet:
RecoTracker/LST Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category
@cmsbuild, @rappoccio, @jfernan2, @davidlange6, @mandrenguyen, @fabiocos, @antoniovilela can you please review it and eventually sign? Thanks. @VourMa, @missirol, @gpetruc, @rovere, @GiacomoSguazzoni, @VinInn, @Martin-Grunewald, @mmusich, @mtosi, @dgulhan, @JanFSchulte, @fabiocos, @felicepantaleo, @makortel this is something you requested to watch as well. @rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.
cms-bot commands are listed here
test parameters:
- pull_request = cms-data/RecoTracker-LSTCore#1
@cmsbuild please test
-1
Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/summary.html
COMMIT: 891eb1192817a5479f1135ba3741f80edee46df7
CMSSW: CMSSW_14_1_X_2024-06-01-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45117/39661/install.sh to create a dev area with all the needed externals and cmssw changes.
Unit Tests
I found 2 errors in the following unit tests:
---> test TestDQMOnlineClient-hlt_dqm_sourceclient had ERRORS ---> test testTrackingResolution had ERRORS
Comparison Summary
Summary:
- You potentially added 16 lines to the logs
- Reco comparison results: 10 differences found in the comparisons
- DQMHistoTests: Total files compared: 49
- DQMHistoTests: Total histograms compared: 3445370
- DQMHistoTests: Total failures: 6
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3445344
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
- Checked 206 log files, 170 edm output root files, 49 DQM output files
- TriggerResults: no differences found
I found 2 errors in the following unit tests:
both are apparently related to LST https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/unitTests/failed.html
I found 2 errors in the following unit tests:
both are apparently related to LST https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/unitTests/failed.html
An exception of category 'PluginNotFound' occurred while
[0] Constructing the EventProcessor
Exception Message:
Unable to find plugin 'LSTModulesDevESProducer@alpaka' in category 'CMS EDM Framework ESModule'. Please check spelling of name.
it's not obvious how this dependency comes about from looking at https://github.com/cms-sw/cmssw/blob/master/DQM/TrackingMonitorSource/test/testTrackResolution_cfg.py (a Run3 test)
@makortel do you see a clear way how the LST ES dependency makes it through here?
assign heterogeneous
New categories assigned: heterogeneous
@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45117/40462
-
This PR adds an extra 788KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py modified in PR(s): #33532, #43761, #43964, #44785, #44910, #45005
test parameters:
- pull_request = cms-data/RecoTracker-LSTCore#1
- workflows = 24834.703
- workflows_gpu = 24834.704
Pull request #45117 was updated. @AdrianoDee, @fabiocos, @sunilUIET, @jfernan2, @makortel, @fwyzard, @mandrenguyen, @subirsarkar, @miquork, @davidlange6, @srimanob, @rappoccio, @cmsbuild, @antoniovilela can you please check and sign again.
@cmsbuild please test
about test parameters:
I'm not sure the last one is correct with workflows_gpu = 24834.704. There is just one new GPU workflow needed in these tests; do I need to enable gpu as well?
do you see a clear way how the LST ES dependency makes it through here?
Maybe edmConfigTree could be helpful?
-1
Failed Tests: UnitTests RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39677/summary.html
COMMIT: dc4e61f6918237954d7a89c215a0ac877a374807
CMSSW: CMSSW_14_1_X_2024-06-03-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45117/39677/install.sh to create a dev area with all the needed externals and cmssw changes.
The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
- @wddgit cms-sw/cmssw#45017
- @cgsavard cms-sw/cmssw#45011
- @gpetruc cms-sw/cmssw#44963
- @AdrianoDee cms-sw/cmssw#44965
- @iarspider cms-sw/cmssw#45076
- @Pruthvi-ch cms-sw/cmssw#45066
- @Dr15Jones cms-sw/cmssw#45094
- @lowette cms-sw/cmssw#44992
- @iarspider cms-sw/cmssw#45096
- @iarspider cms-sw/cmssw#45078
- @iarspider cms-sw/cmsdist#9222
You can see more details here: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39677/git-recent-commits.json https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39677/git-merge-result
Unit Tests
I found 24 errors in the following unit tests:
---> test ApeTest had ERRORS ---> test testAlignmentStats had ERRORS ---> test test_MilleZmm had ERRORS and more ...
RelVals
- 24834.703
24834.703_TTbar_14TeV+2026D98_lstOnCPUIters01TrackingOnly/step3_TTbar_14TeV+2026D98_lstOnCPUIters01TrackingOnly.log
The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
@smuzaffar
do I take it that there is still no way to disable using the head of cms-sw/master branch in PRs depending on cms-data?
or is there a way to ask in test parameters?
I had to force-push to fix some issues we were having internally, but after this we will have a consistent commit history.
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45117/40488
-
This PR adds an extra 36KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py modified in PR(s): #33532, #43761, #43964, #44785, #44910, #45005
Pull request #45117 was updated. @AdrianoDee, @rappoccio, @jfernan2, @antoniovilela, @subirsarkar, @davidlange6, @miquork, @mandrenguyen, @sunilUIET, @cmsbuild, @fabiocos, @fwyzard, @makortel, @srimanob can you please check and sign again.
do you see a clear way how the LST ES dependency makes it through here?
Maybe
edmConfigTreecould be helpful?
@makortel
it looks like an ESProducer with @alpaka will not be found if the alpaka service is not loaded.
I see that in other places makeProcessModifier is used in connection with a modifier that's supposed to also trigger loading of the alpaka service.
Also, the framework apparently constructs ES modules even if their data is not consumed.
Notably in the standard include StandardSequences.Services_cff we have
(gpu | pixelNtupletFit | alpaka).makeProcessModifier(_addProcessAccelerators)
For LST we'll just follow other examples and load the ES (and why not, all other files) with a makeProcessModifier.
it looks like an ESProducer with
@alpakawill not be found if the alpaka service is not loaded.
The missing piece is ProcessAcceleratorAlpaka (i.e. process.load("HeterogeneousCore.AlpakaCore.ProcessAcceleratorAlpaka_cfi") or equivalent) to be exact. (AlpakaService has nothing to do with this part).
Also, the framework apparently constructs ES modules even if their data is not consumed.
This is true for all modules (i.e. both ED and ES). Framework knows the data dependencies only after all modules have been constructed.
Notably in the standard include StandardSequences.Services_cff we have
(gpu | pixelNtupletFit | alpaka).makeProcessModifier(_addProcessAccelerators)
The point there is that if a GPU is desired, the Configuration.StandardSequences.Accelerators_cff gets loaded (that then sets up the necessary ProcessAccelerators and Services). I.e. a job loading either Configuration.StandardSequences.Services_cff or Configuration.StandardSequences.Accelerators_cff doesn't have to do anything.
I believe eventually the loading of Accelerators_cff in Services_cff (or a direct load() in cmsDriver.py-generated configurations) would become the default behavior.
For LST we'll just follow other examples and load the ES (and why not, all other files) with a
makeProcessModifier.
I'd suggest to try first if including the Alpaka-based ESProducer in cms.Tasks (among the EDModules) and modifying the Task content with modifiers would be feasible (if an ESProducer is in any Task the process sees, the ESProducer is constructed only if it is in a Task associated with the Schedule directly or via any Path, more details in SWGuideAboutPythonConfigFile). IIRC this approach has been used successfully in other Alpaka PRs. The makeProcessModifier is quite "heavy" operation, and should be used only if nothing else works. (on the other hand, if you want to go towards an organization where e.g. two different cffs are loaded depending on Modifiers, then makeProcessModifier would be the way to go)
I'd suggest to try first if including the Alpaka-based ESProducer in
cms.Tasks (among the EDModules) and modifying the Task content with modifiers would be feasible (if an ESProducer is in anyTasktheprocesssees, the ESProducer is constructed only if it is in a Task associated with theScheduledirectly or via anyPath
that's what we have already in the not working case: the lstModulesDevESProducer is present only in _HighPtTripletStepTask_LST
which becomes real after (trackingPhase2PU140 & trackingLST).toReplaceWith(HighPtTripletStepTask, _HighPtTripletStepTask_LST)
Yet, the Run3 unit tests somehow fail (while the run3 matrix workflows do not)
IIRC this approach has been used successfully in other Alpaka PRs.
there are some examples both ways
makeProcessModifierhttps://github.com/cms-sw/cmssw/blob/ebe68aa956253d0403521939367e6d4552798552/RecoLocalTracker/SiPixelRecHits/python/PixelCPEESProducers_cff.py#L23- or firectly importing
Accelerators_cffhttps://github.com/cms-sw/cmssw/blob/ebe68aa956253d0403521939367e6d4552798552/EventFilter/EcalRawToDigi/python/ecalDigis_cff.py#L17C38-L17C54
Should we add Accelerators_cff if the makeProcessModifier is expensive?
- or firectly importing
Accelerators_cffhttps://github.com/cms-sw/cmssw/blob/ebe68aa956253d0403521939367e6d4552798552/EventFilter/EcalRawToDigi/python/ecalDigis_cff.py#L17C38-L17C54
ah, that's why the matrix workflows do not fail: most of them include ecalDigis_cff eventually
that's what we have already in the not working case: the
lstModulesDevESProduceris present only in_HighPtTripletStepTask_LST
I guess the problem is the lstModulesDevESProducer being in a Task whose name starts with an underscore. Those are "module private" in python, and do not automatically propagate through import or process.load(). If you would change the name of that Task to not have the leading underscore, I'd expect the framework to not construct lstModulesDevESProducer when trackingPhase2PU140 & trackingLST is inactive.
makeProcessModifierhttps://github.com/cms-sw/cmssw/blob/ebe68aa956253d0403521939367e6d4552798552/RecoLocalTracker/SiPixelRecHits/python/PixelCPEESProducers_cff.py#L23
On a cursory look this could be ok because the conditional action is to process.load() other python files. (but didn't look further if things could be done better)
- or firectly importing
Accelerators_cffhttps://github.com/cms-sw/cmssw/blob/ebe68aa956253d0403521939367e6d4552798552/EventFilter/EcalRawToDigi/python/ecalDigis_cff.py#L17C38-L17C54
While from the framework perspective having each cff to import everything it depends on would actually be preferable, this is not how cmsDriver-based configuration infrastructure has been crafted. The Accelerators_cff should be treated similarly to Services_cff, and the approach seems to be that all cfi/cff fragments assume the cfg loads the Services_cff (and thus Accelerators_cff) by other means. From that perspective the from Accelerators_cff import * in ecalDigis_cff "slipped through" and should be cleaned up. Or, I don't immediately see why relying on the cfg in some way loading the Accelerators_cff would not work (and I'm not able to look into details before July).
it sounds like makeProcessModifier is not quite desirable and Accelerators_cff is not right.
I could think of another pattern (partly meta-code)
from RecoTracker.LST.lstModulesDevESProducer_cfi import lstModulesDevESProducer as _lstES
lstES = cms.ESProducer('Placeholder')
lstModifier.toReplaceWith(lstES, _lstES)
what would be a good Placeholder ? I guess it should be something without actual ES product registration so that this pattern can be reused in many places.
I'm not able to look into details before July
thank you for your time while not available to look at details. I hope other reviewers will be available to help.