Handle event meta data changes in streamer files
PR description:
Moved storage of meta data needed for events into the first EventMessage sent before the subsequent events which need that meta data. This fixed a problem seen online where different HLT nodes were generating different event meta data for the same luminosity block but online the event data for one node is used.
PR validation:
All unit tests associated with streamer files pass.
please test
cms-bot internal usage
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44892/40158
-
This PR adds an extra 248KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File CalibCalorimetry/EcalLaserSorting/test/streamIn_cfg.py modified in PR(s): #44368
A new Pull Request was created by @Dr15Jones for master.
It involves the following packages:
- CalibCalorimetry/EcalLaserSorting (alca)
- DQMServices/StreamerIO (dqm)
- DataFormats/Streamer (core)
- EventFilter/Utilities (daq)
- IOPool/Streamer (core)
@saumyaphor4252, @perrotta, @nothingface0, @smuzaffar, @tjavaid, @syuvivida, @antoniovagnerini, @makortel, @smorovic, @consuegs, @rvenditti, @Dr15Jones, @emeschi can you please review it and eventually sign? Thanks. @wddgit, @rovere, @ReyerBand, @argiro, @wang0jin, @yuanchao, @makortel, @Martin-Grunewald, @rsreds, @mmusich, @missirol, @tocheng, @rchatter, @barvic, @thomreis this is something you requested to watch as well. @rappoccio, @antoniovilela, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-79e450/39218/summary.html
COMMIT: e3d9efe76dda6137dffb9d914315ce354741c653
CMSSW: CMSSW_14_1_X_2024-05-02-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/44892/39218/install.sh to create a dev area with all the needed externals and cmssw changes.
Comparison Summary
Summary:
- No significant changes to the logs found
- Reco comparison results: 4 differences found in the comparisons
- DQMHistoTests: Total files compared: 48
- DQMHistoTests: Total histograms compared: 3331548
- DQMHistoTests: Total failures: 3
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3331525
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
- Checked 202 log files, 165 edm output root files, 48 DQM output files
- TriggerResults: no differences found
Part of addressing https://github.com/cms-sw/cmssw/issues/44643
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44892/40217
-
This PR adds an extra 76KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File CalibCalorimetry/EcalLaserSorting/test/streamIn_cfg.py modified in PR(s): #44368
Pull request #44892 was updated. @cmsbuild, @emeschi, @perrotta, @antoniovagnerini, @makortel, @rvenditti, @consuegs, @syuvivida, @nothingface0, @Dr15Jones, @smorovic, @tjavaid, @saumyaphor4252, @smuzaffar can you please check and sign again.
please test
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-79e450/39329/summary.html
COMMIT: d9b3643633eea6c6933a4f17e941698eaa28a249
CMSSW: CMSSW_14_1_X_2024-05-09-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/44892/39329/install.sh to create a dev area with all the needed externals and cmssw changes.
Comparison Summary
Summary:
- You potentially added 6 lines to the logs
- Reco comparison results: 6 differences found in the comparisons
- DQMHistoTests: Total files compared: 48
- DQMHistoTests: Total histograms compared: 3332476
- DQMHistoTests: Total failures: 3
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3332453
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
- Checked 202 log files, 165 edm output root files, 48 DQM output files
- TriggerResults: no differences found
+daq
+core
My comments were more or less cosmetic that can be left for a subsequent PR
+alca
- Mostly technical, for what AlCa code is concerned
@cms-sw/dqm-l2 ping
@Dr15Jones This PR depends on files which don't exist on 14_0_6 (the current production release we're using), e.g. CatStreamerFiles.cpp.
I don't think a backport to 14_0_X is easily feasible, is it, so that we can test this PR more easily?
We could try our tests with 14_1_X, but the streamers we have originate from 14_0_X, and will probably be incompatible.
Any ideas?
Dimitris (DQM)
@Dr15Jones This PR depends on files which don't exist on 14_0_6 (the current production release we're using), e.g.
CatStreamerFiles.cpp.I don't think a backport to 14_0_X is easily feasible, is it?
We could try our tests with 14_1_X, but the streamers we have originate from 14_0_X, and will probably be incompatible.
Any ideas?
In the past we did not have had backward compatibility for streamer files, at least not across major release cycles (last time there was such break was when something related to edm::Wrapper was changed, as far as I remember). If with this there is no such compatibility, when we switch to 14_1 online (or even to a release having the backport), all live consumers of streamer files would also have to switch at the same time.
when we switch to 14_1 online (or even to a release having the backport), all live consumers of streamer files would also have to switch at the same time.
Yes, of course, as far as production is concerned. I'm just asking if there is a way to test this PR from our side with CMSSW 14_0_6 and streamers produced from 14_0_6 as well.
I tried applying the diff of this PR directly to 14_0_6, but there are many commits that added files in the meantime that it doesn't apply cleanly.
From what you're telling me, it's also breaking backwards compatibility, so it wouldn't also work on "old" streamers as well.
Yes, of course, as far as production is concerned. I'm just asking if there is a way to test this PR from our side with CMSSW 14_0_6 and streamers produced from 14_0_6 as well.
From what you're telling me, it's also breaking backwards compatibility, so it wouldn't also work on "old" streamers as well.
Yes, If I understand changes correctly,
branchIDListHelper.updateFromInput(header.branchIDLists());
thinnedHelper.updateFromPrimaryInput(header.thinnedAssociationsHelper());
with removing this from the streamer source (as well as new INI format if that is changed), backwards compatibility is broken.
Streamer files can still be converted via writing out root files with 14_0 and recreating streamer files with 14_1 version (although maybe without preserving HLT path bits in streamer event headers, and I know DQM uses them for filtering in some clients). Maybe this can be also used to test the patch (with a 14_1 prerelease).
I tried applying the diff of this PR directly to 14_0_6, but there are many commits that added files in the meantime that it doesn't apply cleanly.
I was planning on doing a backport once this PR was merged (plus we were going to do 1 additional change in another PR that we'd want to add to the backport). I can do a preliminary backport today if it would help.
I was planning on doing a backport once this PR was merged (plus we were going to do 1 additional change in another PR that we'd want to add to the backport). I can do a preliminary backport today if it would help.
Actually it would help a lot, and we could test it immediately! :)
@nothingface0 see https://github.com/cms-sw/cmssw/pull/44978
test parameters:
- addpkg = DQM/Integration
@cmsbuild, please test
so, despite the bot declaring success on the unit tests, there are actual failures in the DQM/Integration unit tests that use streamer files as input, see e.g.:
the reason why the test succeed despite the failures is due to feature of the DQMStreamerReader (see discussion at https://github.com/cms-sw/cmssw/issues/43108#issuecomment-1779170062).
Now granted that "streamer files do not support schema evolution", I don't think we want to leave these tests (silently) broken.
the reason why the test succeed despite the failures is due to feature of the DQMStreamerReader
So I modified some of the DQMStreamerReader unit tests to check that all Events that were expected were actually read in order to avoid the problem with the module ignoring the exceptions.
See change to EventFilter/Utilities/test/test_dqmstream.py.
-1
Failed Tests: RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-79e450/39393/summary.html
COMMIT: d9b3643633eea6c6933a4f17e941698eaa28a249
CMSSW: CMSSW_14_1_X_2024-05-15-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/44892/39393/install.sh to create a dev area with all the needed externals and cmssw changes.
RelVals-INPUT
- 4.76
4.76_ZMuSkim2012D/step2_ZMuSkim2012D.log
Comparison Summary
Summary:
- You potentially added 2 lines to the logs
- Reco comparison results: 10 differences found in the comparisons
- DQMHistoTests: Total files compared: 48
- DQMHistoTests: Total histograms compared: 3338976
- DQMHistoTests: Total failures: 9
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3338947
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
- Checked 202 log files, 165 edm output root files, 48 DQM output files
- TriggerResults: no differences found
@cmsbuild please test (it passed now in other PR..)
@mmusich
Now granted that "streamer files do not support schema evolution", I don't think we want to leave these tests (silently) broken.
What do you want done about this? A unit test that can't actually tell if there is a failure isn't really a unit test. I don't really feel it is my place to try to make it better (as the design of that test will never work in the cases we have to change the streamer format as we explicitly say only files written with the same release can be read by a release and that test breaks that).
I don't really feel it is my place to try to make it better (as the design of that test will never work in the cases we have to change the streamer format as we explicitly say only files written with the same release can be read by a release and that test breaks that).
someone (presumably @cms-sw/dqm-l2) should:
- a) reply to the proposal at https://github.com/cms-sw/cmssw/issues/43108#issuecomment-1780696627 to improve the test in order to be able to catch this sort of issue and actually implement the change;
- b) perform the procedure that you proposed at https://github.com/cms-sw/cmssw/pull/44978#issuecomment-2112955723, substitute the current files in
cms-data(that with this PR become broken) with the new format and merge the change together or shortly after this PR.
IF the breaking of streamer format becomes a routine, we might think about removing the tests completely.
Regarding:
perform the procedure that you proposed at Handle event meta data changes in streamer files [14_0] #44978 (comment), substitute the current files in
cms-data(that with this PR become broken) with the new format and merge the change together or shortly after this PR.
We're (DQM) in the process of doing it.