cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

L1Trk reco enabling CheckHistory by default

Open tschuh opened this issue 1 year ago • 20 comments

This PR enables the CheckHistory option by default. Without this check there's a danger of people running CMSSW 14 L1Trk reco on stubs from earlier MC getting nonsense results and not realizing it. This PR got requested by @tomalin

tschuh avatar Jan 10 '24 12:01 tschuh

cms-bot internal usage

cmsbuild avatar Jan 10 '24 12:01 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43690/38371

  • This PR adds an extra 16KB to repository

cmsbuild avatar Jan 10 '24 12:01 cmsbuild

A new Pull Request was created by @tschuh for master.

It involves the following packages:

  • L1Trigger/TrackFindingTracklet (l1)
  • L1Trigger/TrackerDTC (upgrade, l1)
  • L1Trigger/TrackerTFP (l1)

@epalencia, @aloeliger, @subirsarkar, @srimanob, @cmsbuild can you please review it and eventually sign? Thanks. @missirol, @erikbutz, @Martin-Grunewald, @skinnari this is something you requested to watch as well. @rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

cmsbuild avatar Jan 10 '24 12:01 cmsbuild

@BenjaminRS I was pretty jet lagged yesterday and hadn't slept in 48 hours, so I'm not too sure, but I seem to recall this coming up yesterday, were you aware of this PR?

aloeliger avatar Jan 16 '24 09:01 aloeliger

please test

aloeliger avatar Jan 16 '24 09:01 aloeliger

-1

Failed Tests: RelVals RelVals-INPUT Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-254412/36831/summary.html COMMIT: d16d0f239c096724944636cfea71501d6cd906f5 CMSSW: CMSSW_14_0_X_2024-01-15-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43690/36831/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

----- Begin Fatal Exception 16-Jan-2024 10:50:07 CET-----------------------
An exception of category 'FileInPathError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing module: class=ElectronMVAValueMapProducer label='electronMVAValueMapProducer'
Exception Message:
edm::FileInPath unable to find file RecoEgamma/ElectronIdentification/data/MVAWeightFiles/Winter22HZZV1/EB1_5.weights.xml.gz anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43690/36831/CMSSW_14_0_X_2024-01-15-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43690/36831/CMSSW_14_0_X_2024-01-15-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43690/36831/CMSSW_14_0_X_2024-01-15-2300/external/el8_amd64_gcc12/data:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02820/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_X_2024-01-15-2300/poison:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02820/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_X_2024-01-15-2300/src:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02820/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_X_2024-01-15-2300/external/el8_amd64_gcc12/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/matrix-results/136.8311_RunJetHT2017FreMINIAOD
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 16-Jan-2024 10:50:07 CET-----------------------
An exception of category 'FileInPathError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing module: class=ElectronMVAValueMapProducer label='electronMVAValueMapProducer'
Exception Message:
edm::FileInPath unable to find file RecoEgamma/ElectronIdentification/data/MVAWeightFiles/Winter22HZZV1/EB1_5.weights.xml.gz anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43690/36831/CMSSW_14_0_X_2024-01-15-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43690/36831/CMSSW_14_0_X_2024-01-15-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43690/36831/CMSSW_14_0_X_2024-01-15-2300/external/el8_amd64_gcc12/data:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02820/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_X_2024-01-15-2300/poison:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02820/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_X_2024-01-15-2300/src:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02820/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_X_2024-01-15-2300/external/el8_amd64_gcc12/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/matrix-results/136.7611_RunJetHT2016EreMINIAOD
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 16-Jan-2024 10:50:08 CET-----------------------
An exception of category 'FileInPathError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing module: class=ElectronMVAValueMapProducer label='electronMVAValueMapProducer'
Exception Message:
edm::FileInPath unable to find file RecoEgamma/ElectronIdentification/data/MVAWeightFiles/Winter22HZZV1/EB1_5.weights.xml.gz anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43690/36831/CMSSW_14_0_X_2024-01-15-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43690/36831/CMSSW_14_0_X_2024-01-15-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43690/36831/CMSSW_14_0_X_2024-01-15-2300/external/el8_amd64_gcc12/data:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02820/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_X_2024-01-15-2300/poison:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02820/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_X_2024-01-15-2300/src:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02820/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_0_X_2024-01-15-2300/external/el8_amd64_gcc12/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/matrix-results/136.88811_RunJetHT2018DreMINIAODUL
----- End Fatal Exception -------------------------------------------------
Expand to see more relval errors ...

RelVals-INPUT

  • 4.64.6_MinimumBias2010A/step2_MinimumBias2010A.log
  • 136.72411136.72411_RunJetHT2016B_reminiaodUL/step2_RunJetHT2016B_reminiaodUL.log
  • 136.72412136.72412_RunJetHT2016B_reminiaodUL/step2_RunJetHT2016B_reminiaodUL.log
Expand to see more relval errors ...

cmsbuild avatar Jan 16 '24 11:01 cmsbuild

@BenjaminRS I was pretty jet lagged yesterday and hadn't slept in 48 hours, so I'm not too sure, but I seem to recall this coming up yesterday, were you aware of this PR?

Thanks @aloeliger for the notification. I was not aware of the PR. Looks like with this in place it will now throw an exception if the tracking is run on earlier MC.

BenjaminRS avatar Jan 16 '24 14:01 BenjaminRS

please test

aloeliger avatar Jan 17 '24 10:01 aloeliger

please test

What's the purpose of this re-test?

Looks like with this in place it will now throw an exception if the tracking is run on earlier MC.

This PR will never pass integration tests (see e.g. this log).

mmusich avatar Jan 17 '24 10:01 mmusich

please test

What's the purpose of this re-test?

Looks like with this in place it will now throw an exception if the tracking is run on earlier MC.

This PR will never pass integration tests (see e.g. this log).

@mmusich Apologies, I was looking at the failed test earlier that seemed to have been a temporary issue and just retriggering as I had done for another L1 PR.

aloeliger avatar Jan 17 '24 10:01 aloeliger

please abort

aloeliger avatar Jan 17 '24 10:01 aloeliger

Milestone for this pull request has been moved to CMSSW_14_1_X. Please open a backport if it should also go in to CMSSW_14_0_X.

cmsbuild avatar Feb 06 '24 10:02 cmsbuild

Hi all, this PR seems to have stalled. If this PR is not merged, there is a danger that people using CMSSW 14 to reconstruct events digitised with earlier versions of CMSSW will get nonsense L1 tracks, without any error message to warn them about it. (As an aside, I remark that CMSSW in general seems weak in ths regard, not offering checks that digi & reco steps use compatible configurations, when they are run as separate jobs).

tomalin avatar Feb 15 '24 08:02 tomalin

Hi @tomalin @tschuh I can't connect between https://github.com/cms-sw/cmssw/pull/43690#issuecomment-1893830114 to your answer. Will it throw exception if the tracking is run on earlier MC. If I understand the L1T plan correctly for AR, the 14_1 release will be run on top of 13_X samples (2023 Phase-2 MC). Will that lead to the exception?

srimanob avatar Feb 15 '24 08:02 srimanob

Adding @skinnari & @aehart . HL-LHC GEN-SIM-DIGI-RAW MC samples contain a collection of "Stubs" (i.e. pairs of hits) from the outer tracker, which were generated using a specific set of cuts (that in real-life will be applied in the tracker front-end electronics). This set of cuts was changed (following optimisation) in CMSSW 14. as a result of this PR https://github.com/cms-sw/cmssw/pull/43260 . The L1 track reconstruction, which is run as part of the subsequent MC production stage, using GEN-SIM-DIGI-RAW as its input, requires a knowledge of the cuts used in creating the stubs. Hence if these cuts, which are specified in L1Trigger/TrackTrigger/python/TTStubAlgorithmRegister_cfi.py have changed since the stubs were produced, it can go wrong. If this PR https://github.com/cms-sw/cmssw/pull/43690 is not merged, then this error would occur silently, with poor L1 tracking performance and no warnings. If it is merged, then an exception wil be thrown. Anyone wanting to run L1 trigger code in CMSSW 14 on top of GEN-SIM-DIGI-RAW produced with CMSSW 13 would have to regenerate the outer tracker stub collection and its MC truth assocition info.

tomalin avatar Feb 15 '24 09:02 tomalin

Anyone wanting to run L1 trigger code in CMSSW 14 on top of GEN-SIM-DIGI-RAW produced with CMSSW 13 would have to regenerate the outer tracker stub collection and its MC truth assocition info.

this requires cleaning (removing ?) all the PR and IB tests that run on input files that were produced using older CMSSW releases. Accepted as is this PR wll create IB failures.

mmusich avatar Feb 15 '24 09:02 mmusich

@mmusich @tomalin We seem to have hit an impasse here. I am going through trying to finalize or un-stick L1T PRs. What's to be done about this one?

aloeliger avatar Feb 26 '24 10:02 aloeliger

What's to be done about this one?

see https://github.com/cms-sw/cmssw/pull/43690#issuecomment-1945677084 (assuming that's the way we collectively want to go). It looks like integrating this PR might create issues later on, though. I think at minimum it should be discussed with the involved parties (@cms-sw/pdmv-l2 and @cms-sw/upgrade-l2)

mmusich avatar Feb 26 '24 10:02 mmusich

What's to be done about this one?

see #43690 (comment) (assuming that's the way we collectively want to go). It looks like integrating this PR might create issues later on, though. I think at minimum it should be discussed with the involved parties (@cms-sw/pdmv-l2 and @cms-sw/upgrade-l2)

@mmusich Is there a meeting we can discuss this at, to reach a conclusion on this PR? Although this PR solves the issue for a specific case, CMSSW does have a wider problem of not offering users protection against the mistake of using two incompatible CMSSW versions for different parts of the simulation+reco chain

tomalin avatar Mar 15 '24 15:03 tomalin

@tomalin

Is there a meeting we can discuss this at, to reach a conclusion on this PR?

normally such requests should be brought up by the cmssw L2 of the concerned group (in this case would be the ones already mentioned above: l1, upgrade, possibly pdmv) in order to be discussed at the "Software Release Meeting" (each Tuesday at 17:00, https://indico.cern.ch/category/7688/, excepted next week, which is Offline&Computing week). I commented because this PR caught my eye, but admittedly I have no real say in it.

mmusich avatar Mar 15 '24 15:03 mmusich

@tomalin Would you like us to bring this up at some future point. Would you have time to talk to @epalencia and myself about this? @epalencia Would you be willing to talk about this in a future ORP meeting?

aloeliger avatar Mar 25 '24 12:03 aloeliger

My quick two cents (sorry I'm seeing this only now not being explicitly a PdmV PR):

  1. isn't this the kind of things that should go in a GT explicitly to avoid this kind of issues?
  2. does it mean, basically, that, as is, https://github.com/cms-sw/cmssw/pull/43260 breaks de facto CMSSW backward compatibility?

AdrianoDee avatar Mar 25 '24 13:03 AdrianoDee

Let me add, from PdmV point of view there's no big technical issue in updating all the input for Phase2 RelVals to point to 14_X samples before this goes in. I need to check if we have all what is needed in a 14_X release (but I would suppose she). Still it looks like just a patch to a bigger problem.

AdrianoDee avatar Mar 25 '24 13:03 AdrianoDee

  1. does it mean, basically, that, as is, New l1 track trigger stub window tune #43260 breaks de facto CMSSW backward compatibility?

On a second thought I went too far here. It's not really breaking it. Still it would be odd that to run RECO on a 13_X GEN-SIM-DIGI-RAW sample I would need to re-run a piece of a previous step (if I understood correctly).

AdrianoDee avatar Mar 25 '24 13:03 AdrianoDee

@tschuh If prior to running the DTC code + L1 tracking on a GEN-SIM-DIGI-RAW dataset, I recreate the stub collection, then this PR causes an error to be thrown. This is incorrect behaviour, since stubs, DTC & L1 tracking will in this scenario all be produced using the same stub window cuts. Talking to you, I understand that if several stub collections, each produced with a different process name exists, then your code requires that all use the same stub window sizes. Since CMSSW will always use the most recently added stub collection, this PR only needs to check the stub window sizes used to produce that one.

tomalin avatar Apr 07 '24 20:04 tomalin

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43690/40847

Code check has found code style and quality issues which could be resolved by applying following patch(s)

  • code-format: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43690/40847/code-format.patch e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43690/40847/code-format.patch | patch -p1 You can also run scram build code-format to apply code format directly

cmsbuild avatar Jul 08 '24 11:07 cmsbuild

I changed the code to only compare the last two processes. However, correct behavior would be to compare the current (which is the last) process with the one which produced the latest stub collection. I don't know how to identify that process. One option would be to give it as an config parameter.

tschuh avatar Jul 08 '24 11:07 tschuh

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43690/40848

cmsbuild avatar Jul 08 '24 11:07 cmsbuild

Pull request #43690 was updated. @aloeliger, @cmsbuild, @epalencia, @srimanob, @subirsarkar can you please check and sign again.

cmsbuild avatar Jul 08 '24 11:07 cmsbuild

Milestone for this pull request has been moved to CMSSW_14_2_X. Please open a backport if it should also go in to CMSSW_14_1_X.

cmsbuild avatar Aug 27 '24 08:08 cmsbuild