Improve behavior after exception in begin/end run transitions
PR description:
Improve the behavior of the Framework after exceptions in Run begin/end stream/global transitions. This is the third in a series of PRs where we plan to make the behavior after exceptions more consistent in all the begin/end transitions. The first PR handled stream begin/end lumi exceptions (PR #44624). The second PR handled global begin/end lumi exceptions (PR #44840). The comments at the head of the first PR state the design for this behavior we are implementing.
The intent is that nothing in the output will change if there are not any exceptions. The order of modules in begin/end stream transitions may change, although in existing releases these functions are run asynchronously and the order can vary if an identical configuration is repeated in multi-threaded jobs. It is a problem already if something depends on that order.
Another minor change in behavior is that signals pre/post for beginStream and endStream will no longer be issued for trigger results inserter, path status inserters, and end path status inserters. These modules don't do anything in those transitions. I examined the services watching those signals and cannot see any reason for those signals to be emitted.
This work was motivated by discussions related to Issues #43831 and #42501.
The most complicated detail in this PR is that instead of each StreamSchedule having one WorkerManager, each StreamSchedule will have 3, one for lumis/events, one for runs, and one for beginStream/endStream. Although I intend to address the beginStream/endStream transitions in the next PR, I went ahead and converted them to use a different WorkerManager because I didn't want to have to modify that complex part of the code twice.
PR validation:
An existing unit test covering exceptions in different transitions is extended to cover the most salient cases. Additional manual testing of many various cases was also done. Existing unit tests pass.
cms-bot internal usage
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45017/40305
-
This PR adds an extra 168KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File FWCore/Integration/plugins/ExceptionThrowingProducer.cc modified in PR(s): #44372
A new Pull Request was created by @wddgit for master.
It involves the following packages:
- FWCore/Framework (core)
- FWCore/Integration (core)
- FWCore/ServiceRegistry (core)
@cmsbuild, @Dr15Jones, @makortel, @smuzaffar can you please review it and eventually sign? Thanks. @makortel, @fwyzard, @missirol this is something you requested to watch as well. @antoniovilela, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here
please test
-1
Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2909e4/39476/summary.html
COMMIT: 44b20e068f0b3d7534ebce0dd8a7602d43e53f06
CMSSW: CMSSW_14_1_X_2024-05-22-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45017/39476/install.sh to create a dev area with all the needed externals and cmssw changes.
Build
I found compilation error when building:
>> Compiling src/Mixing/Base/src/PileupRandomNumberGenerator.cc
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/bin/c++ -c -DGNU_GCC -D_GNU_SOURCE -DTBB_USE_GLIBCXX_VERSION=120301 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DCMSSW_GIT_HASH='CMSSW_14_1_X_2024-05-22-1100' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_14_1_X_2024-05-22-1100' -Isrc -Ipoison -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_1_X_2024-05-22-1100/src -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/cms/coral/CORAL_2_3_21-27ab7e52f21297bcbeaa636ca097acc7/include/LCG -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/pcre/8.43-e34796d17981e9b6d174328c69446455/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/boost/1.80.0-941b136a4a3be6f8bc1e903d36ddc172/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/bz2lib/1.0.6-d065ccd79984efc6d4660f410e4c81de/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/clhep/2.4.7.1-8e40efd27b7394c1fa4e9c7e432d85cd/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/curl/7.79.0-e9aea8dd47e409f0dcfd76a7b3220112/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/gsl/2.6-5e2ce72ea2977ff21a2344bbb52daf5c/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/libuuid/2.34-27ce4c3579b5b1de2808ea9c4cd8ed29/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/lcg/root/6.30.07-f3322c77db1c59847b28fde88ff7218c/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/xerces-c/3.1.3-c7b88eaa36d0408120f3c29826a04bf6/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/xz/5.2.5-6f3f49b07db84e10c9be594a1176c114/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/zlib/1.2.11-1a082fc322b0051b504cc023f21df178/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/fmt/8.0.1-258b4791803c34b7e98cf43693e54d87/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/md5/1.0.0-5b594b264e04ae51e893b1d69a797ec6/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/OpenBLAS/0.3.15-c877ab57fa7b04ce290093588c6c5717/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/tinyxml2/6.2.0-88fe0ec301baf763fa3c485e5b67ed91/include -O2 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++17 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -Wno-error=array-bounds -Warray-bounds -fuse-ld=bfd -march=x86-64-v2 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-error=unused-variable -DBOOST_DISABLE_ASSERTS -flto=auto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fPIC -MMD -MF tmp/el8_amd64_gcc12/src/Mixing/Base/src/MixingBase/PileupRandomNumberGenerator.cc.d src/Mixing/Base/src/PileupRandomNumberGenerator.cc -o tmp/el8_amd64_gcc12/src/Mixing/Base/src/MixingBase/PileupRandomNumberGenerator.cc.o
>> Compiling src/Mixing/Base/src/SecondaryEventProvider.cc
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/bin/c++ -c -DGNU_GCC -D_GNU_SOURCE -DTBB_USE_GLIBCXX_VERSION=120301 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DCMSSW_GIT_HASH='CMSSW_14_1_X_2024-05-22-1100' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_14_1_X_2024-05-22-1100' -Isrc -Ipoison -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_1_X_2024-05-22-1100/src -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/cms/coral/CORAL_2_3_21-27ab7e52f21297bcbeaa636ca097acc7/include/LCG -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/pcre/8.43-e34796d17981e9b6d174328c69446455/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/boost/1.80.0-941b136a4a3be6f8bc1e903d36ddc172/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/bz2lib/1.0.6-d065ccd79984efc6d4660f410e4c81de/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/clhep/2.4.7.1-8e40efd27b7394c1fa4e9c7e432d85cd/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/curl/7.79.0-e9aea8dd47e409f0dcfd76a7b3220112/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/gsl/2.6-5e2ce72ea2977ff21a2344bbb52daf5c/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/libuuid/2.34-27ce4c3579b5b1de2808ea9c4cd8ed29/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/lcg/root/6.30.07-f3322c77db1c59847b28fde88ff7218c/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/xerces-c/3.1.3-c7b88eaa36d0408120f3c29826a04bf6/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/xz/5.2.5-6f3f49b07db84e10c9be594a1176c114/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/zlib/1.2.11-1a082fc322b0051b504cc023f21df178/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/fmt/8.0.1-258b4791803c34b7e98cf43693e54d87/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/md5/1.0.0-5b594b264e04ae51e893b1d69a797ec6/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/OpenBLAS/0.3.15-c877ab57fa7b04ce290093588c6c5717/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02838/el8_amd64_gcc12/external/tinyxml2/6.2.0-88fe0ec301baf763fa3c485e5b67ed91/include -O2 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++17 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -Wno-error=array-bounds -Warray-bounds -fuse-ld=bfd -march=x86-64-v2 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-error=unused-variable -DBOOST_DISABLE_ASSERTS -flto=auto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fPIC -MMD -MF tmp/el8_amd64_gcc12/src/Mixing/Base/src/MixingBase/SecondaryEventProvider.cc.d src/Mixing/Base/src/SecondaryEventProvider.cc -o tmp/el8_amd64_gcc12/src/Mixing/Base/src/MixingBase/SecondaryEventProvider.cc.o
src/Mixing/Base/src/SecondaryEventProvider.cc: In function 'void {anonymous}::processOneOccurrence(edm::WorkerManager&, typename T::TransitionInfoType&, edm::StreamID, const typename T::Context*, const U*, bool)':
src/Mixing/Base/src/SecondaryEventProvider.cc:40:16: error: 'addContextAndPrintException' is not a member of 'edm'
40 | edm::addContextAndPrintException("Calling SecondaryEventProvider", ex, cleaningUpAfterException);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
src/Mixing/Base/src/SecondaryEventProvider.cc:42:16: error: 'addContextAndPrintException' is not a member of 'edm'
42 | edm::addContextAndPrintException("", ex, cleaningUpAfterException);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
enable threading
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45017/40308
-
This PR adds an extra 168KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File FWCore/Integration/plugins/ExceptionThrowingProducer.cc modified in PR(s): #44372
Pull request #45017 was updated. @makortel, @cmsbuild, @smuzaffar, @civanch, @mdhildreth, @Dr15Jones can you please check and sign again.
please test
-1
Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2909e4/39478/summary.html
COMMIT: 66d135dff6275136960a90f5fd2827254b69d979
CMSSW: CMSSW_14_1_X_2024-05-22-1100/el8_amd64_gcc12
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45017/39478/install.sh to create a dev area with all the needed externals and cmssw changes.
Unit Tests
I found 1 errors in the following unit tests:
---> test test_TkHistoMap had ERRORS
Comparison Summary
Summary:
- You potentially added 4 lines to the logs
- Reco comparison results: 17 differences found in the comparisons
- DQMHistoTests: Total files compared: 48
- DQMHistoTests: Total histograms compared: 3338862
- DQMHistoTests: Total failures: 854
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3337988
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
- Checked 202 log files, 165 edm output root files, 48 DQM output files
- TriggerResults: no differences found
All the test failures are unrelated to this PR. Except for the following the tests are passing.
The unit test test_TkHistoMap is already failing in the IBs.
Comparison differences: 12834.7 seen in other previous PRs, see Issue #39803 312.0 seen in other previous PRs and in MessageLogger directory 14234.0 seen in other previous PRs
The only change in a simulation package is adding 1 line to add a missing #include of a header file (previously was indirectly included by another header).
please test
The unit test already failing in the IBs was fixed a couple IBs ago so lets rerun to get this one green.
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2909e4/39517/summary.html
COMMIT: 66d135dff6275136960a90f5fd2827254b69d979
CMSSW: CMSSW_14_1_X_2024-05-24-1100/el8_amd64_gcc12
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45017/39517/install.sh to create a dev area with all the needed externals and cmssw changes.
Comparison Summary
Summary:
- You potentially added 1 lines to the logs
- Reco comparison results: 6 differences found in the comparisons
- DQMHistoTests: Total files compared: 48
- DQMHistoTests: Total histograms compared: 3338862
- DQMHistoTests: Total failures: 6
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3338836
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
- Checked 202 log files, 165 edm output root files, 48 DQM output files
- TriggerResults: no differences found
+1
OK for simulation
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45017/40400
-
This PR adds an extra 172KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File FWCore/Integration/plugins/ExceptionThrowingProducer.cc modified in PR(s): #44372
please test
This should resolve all the comments and questions received so far. Let me know if there are any more.
Pull request #45017 was updated. @makortel, @smuzaffar, @Dr15Jones can you please check and sign again.
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2909e4/39591/summary.html
COMMIT: 18ba34191655d9de053ea61d378368bb8b6d039f
CMSSW: CMSSW_14_1_X_2024-05-28-1100/el8_amd64_gcc12
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45017/39591/install.sh to create a dev area with all the needed externals and cmssw changes.
Comparison Summary
Summary:
- No significant changes to the logs found
- Reco comparison results: 4 differences found in the comparisons
- DQMHistoTests: Total files compared: 48
- DQMHistoTests: Total histograms compared: 3338862
- DQMHistoTests: Total failures: 3
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3338839
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
- Checked 202 log files, 165 edm output root files, 48 DQM output files
- TriggerResults: no differences found
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45017/40418
-
This PR adds an extra 172KB to repository
-
There are other open Pull requests which might conflict with changes you have proposed:
- File FWCore/Integration/plugins/ExceptionThrowingProducer.cc modified in PR(s): #44372
Pull request #45017 was updated. @smuzaffar, @makortel, @cmsbuild, @civanch, @Dr15Jones, @mdhildreth can you please check and sign again.
please test
Includes the added comment that was requested. Squashed commits. This should resolve all the comments and questions received so far.
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2909e4/39609/summary.html
COMMIT: 27c49ad338d4327d9f8cbad7099cd3f7860378f7
CMSSW: CMSSW_14_1_X_2024-05-29-1100/el8_amd64_gcc12
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45017/39609/install.sh to create a dev area with all the needed externals and cmssw changes.
Comparison Summary
Summary:
- You potentially removed 2 lines from the logs
- Reco comparison results: 6 differences found in the comparisons
- DQMHistoTests: Total files compared: 48
- DQMHistoTests: Total histograms compared: 3338862
- DQMHistoTests: Total failures: 9
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 3338833
- DQMHistoTests: Total skipped: 20
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
- Checked 202 log files, 165 edm output root files, 48 DQM output files
- TriggerResults: no differences found
+1
+core
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)
+1