Change how run number is defined for harvested root files in multiRun mode
Fixes #9690
Status
not-tested
Description
In short:
- if harvesting MC data (be it in byRun or multiRun mode): run number is set to 1
- if harvesting data in byRun mode, apply no change to the run number: so it takes it from the data harvested
- if harvesting data in multiRun mode, force run to be 999999
Is it backward compatible (if not, which system it affects?)
no, it cannot be applied to workflows with harvesting jobs already created
Related PRs
none
External dependencies / deployment changes
none
Jenkins results:
- Unit tests: failed
- 1 new failures
- 1 tests no longer failing
- Pylint check: failed
- 9 warnings and errors that must be fixed
- 15 comments to review
- Pycodestyle check: succeeded
- 1 comments to review
- Python3 compatibility checks: succeeded
Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10112/artifact/artifacts/PullRequestReport.html
Jenkins results:
- Unit tests: succeeded
- 1 tests no longer failing
- Pylint check: failed
- 9 warnings and errors that must be fixed
- 18 comments to review
- Pycodestyle check: succeeded
- 1 comments to review
- Python3 compatibility checks: succeeded
Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10113/artifact/artifacts/PullRequestReport.html
This will be more complicated than I initially foreseen. Run dependent MC really has run number > 1, where I thought that all that logic was internal to the CMSSW when processing data... Here is part of my harvesting job:
'input_files': [{'checksums': {'adler32': '861551bd', 'cksum': '661964275'},
'events': 0,
'first_event': 0,
'last_event': 0,
'lfn': '/store/backfill/1/CMSSW_11_1_0_pre7/RelValTTbar_13UP18_RD/DQMIO/RECOPRMXUP18_PU25_RD_TC_MC_multiRun_June2020_Val_Alanv12-v11/00000/1ED13C52-B0E9-11EA-8109-D0CDE183BEEF.root',
'locations': set([]),
'merged': True,
'parents': set([]),
'runs': set([]),
'size': 68426312}],
'jobType': 'Harvesting',
'jobgroup': 555,
'location': None,
'mask': {'FirstEvent': None,
'FirstLumi': None,
'FirstRun': None,
'LastEvent': None,
'LastLumi': None,
'LastRun': None,
'inclusivemask': True,
'runAndLumis': {315257: [[1, 36]]}},
so we need to find out a systematic way to identify such run-dependent MC files.
Hi @amaltaro Do you mean we don't have a way to identify between data and MC on harvesting from wm side? Thanks.
Just to be clear, please correct me if I am wrong: right now, before this PR is merged,
- MRH files, either data or MC, have a parameter in WMcore "runLimits", "-%s-%s" % (minRun, maxRun))[1], which is used in the dataset name for DQM. I am not sure how many of these have been uploaded to the DQM GUI, I can only find one of those in the development GUI, none in the Offline GUI. This one: https://tinyurl.com/ycj7luc9
which has RunNumber forced as 999999 in the DQM search box despite there is a mismatch between this and the runNumber displayed in the Menu of the DQM GUI (278017, the longest one in the range?), but dataset name keeps the run range used in the harvesting: /NoBPTX/Run2016F-23Sep2016-v1-277932-278193/DQMIO
This would be the desired behaviour for MRH in DQM GUI, so that DQM user can trace back directly from dataset name, which runs (a range) it contains, despite the search is performed by run = 999999 in the DQM search.
I see several ALCAPROMPT datasets uploaded in this way into the Offline DQM GUI too, all of them with runNumber forced to 999999, but different dataset name and different run displayed in the header of the GUI. E.g. /StreamExpress/Run2018A-PromptCalibProdSiStripGainsAAG-Express-v1-316702-316766/ALCAPROMPT https://tinyurl.com/yaz6vfyt So that they can be distinguished by dataset name (run range) and even by displayed Run Number (in the header of the GUI) despite all have 9999999
-
After https://github.com/dmwm/WMCore/pull/9746 is merged, we lose all the functionality defined above, and everytime a MRH root file is registered for an existing dataset name, it is overwritten no matter the range used in the harvesting
-
For single Run mode, always the run Number is kept
@ahmad3213 @emanueleusai @rvenditti please speak either if you agree or disagree
Thanks
[1] https://github.com/dmwm/WMCore/pull/9746/files#diff-3c13cdc9485083bb43b4e4d3d37f7310b878d36bc137ce2a7cf8f08de4e9daf0L181-R184
Jenkins results:
- Python3 Unit tests: succeeded
- 440 tests deleted
- 19 tests no longer failing
- 13 tests added
- 3 changes in unstable tests
- Python3 Pylint check: failed
- 64 warnings and errors that must be fixed
- 5 warnings
- 343 comments to review
- Pylint py3k check: failed
- 102 errors and warnings that should be fixed
- 79 warnings
- Pycodestyle check: succeeded
- 447 comments to review
Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13176/artifact/artifacts/PullRequestReport.html
Can one of the admins verify this patch?