WMCore icon indicating copy to clipboard operation
WMCore copied to clipboard

Job Retry Project: Modifiers and MemoryModifier

Open LinaresToine opened this issue 11 months ago • 16 comments

Fixes #11881

Status

In development

Description

Introducing Modifiers, which is a feature of the RetryManager. The setup is analog to the Plugins. After a plugin has been selected with selectRetryAlgo, the RetryManagerPoller uses selectJobModifier, where the jobs with specific exit codes will get modified.

As for the Modifiers themselves, this PR only introduces the MemoryModifier, whose functions work towards modifying the job pkl file and the sandbox maxPSS parameter. To use it, the config file must be modified in a way similar to this:

config.RetryManager.modifiers={50660: 'MemoryModifier'} config.RetryManager.section_('MemoryModifier') config.RetryManager.MemoryModifier.section_(<jobType>) config.RetryManager.MemoryModifier.<jobType>.settings = {'requiresModify': True, 'multiplyMemoryPerCore': 200, 'maxMemoryPerCore': 2000}

Is it backward compatible (if not, which system it affects?)

Maybe. I believe it is, since the modifications only imply that the list of jobs to retry will go through an extra step before actually being retried. This patch should work on previous versions without issues.

Related PRs

This is a formal PR to the development that was being worked on by @germanfgv and me in https://github.com/LinaresToine/WMCore/pull/3

External dependencies / deployment changes

None

LinaresToine avatar Mar 12 '24 05:03 LinaresToine

Jenkins results:

  • Python3 Unit tests: failed
    • 8 new failures
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 16 warnings and errors that must be fixed
    • 2 warnings
    • 70 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 52 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14963/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Mar 12 '24 05:03 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 8 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 16 warnings and errors that must be fixed
    • 2 warnings
    • 70 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 52 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14998/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Mar 29 '24 06:03 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 9 new failures
    • 1 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 16 warnings and errors that must be fixed
    • 2 warnings
    • 73 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 54 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14999/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Mar 29 '24 06:03 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 8 new failures
    • 1 tests no longer failing
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 16 warnings and errors that must be fixed
    • 2 warnings
    • 73 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 54 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15000/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Mar 29 '24 06:03 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 9 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 17 warnings and errors that must be fixed
    • 2 warnings
    • 74 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 55 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15001/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Mar 30 '24 23:03 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 8 new failures
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 18 warnings and errors that must be fixed
    • 2 warnings
    • 74 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 57 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15028/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Apr 26 '24 07:04 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 8 new failures
    • 4 changes in unstable tests
  • Python3 Pylint check: failed
    • 29 warnings and errors that must be fixed
    • 5 warnings
    • 101 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 58 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15032/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Apr 29 '24 01:04 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 9 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 29 warnings and errors that must be fixed
    • 5 warnings
    • 101 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 58 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15098/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Jul 04 '24 16:07 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 9 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 30 warnings and errors that must be fixed
    • 5 warnings
    • 120 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 64 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15099/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Jul 05 '24 21:07 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 8 new failures
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 34 warnings and errors that must be fixed
    • 5 warnings
    • 129 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 75 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15100/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Jul 07 '24 23:07 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 10 new failures
    • 1 tests no longer failing
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 33 warnings and errors that must be fixed
    • 5 warnings
    • 129 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 75 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15101/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Jul 08 '24 00:07 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 10 new failures
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 33 warnings and errors that must be fixed
    • 4 warnings
    • 109 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 79 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15103/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Jul 09 '24 22:07 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 10 new failures
    • 1 tests no longer failing
    • 4 changes in unstable tests
  • Python3 Pylint check: failed
    • 35 warnings and errors that must be fixed
    • 4 warnings
    • 109 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 79 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15102/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Jul 09 '24 23:07 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 15 new failures
    • 4 changes in unstable tests
  • Python3 Pylint check: failed
    • 35 warnings and errors that must be fixed
    • 5 warnings
    • 148 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 99 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15104/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Jul 10 '24 11:07 cmsdmwmbot

Jenkins results:

  • Python3 Unit tests: failed
    • 6 new failures
    • 43 tests deleted
    • 572 tests no longer failing
    • 183 tests added
    • 16 changes in unstable tests
  • Python3 Pylint check: failed
    • 35 warnings and errors that must be fixed
    • 5 warnings
    • 148 comments to review
  • Pylint py3k check: failed
    • 1 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 99 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15105/artifact/artifacts/PullRequestReport.html

cmsdmwmbot avatar Jul 10 '24 13:07 cmsdmwmbot

Can one of the admins verify this patch?

cmsdmwmbot avatar Sep 30 '24 20:09 cmsdmwmbot