flux-core
flux-core copied to clipboard
WIP: replace cleanup scriptlets with shutdown script
This makes some improvement to the shutdown process:
- eliminate the
flux admin cleanup-pushmethod of registering cleanup tasks in rc1, and instead have ashutdownscript similar to other rc scripts - add
flux shutdown --forcewhich causes the shutdown script to be called with the-foption - in the shutdown script, add a 10s timeout to
flux queue idlewhen-fis present - in the shutdown script, unload the perilog module before canceling jobs to skip the epilog (@grondo's idea - a good one IMHO)
Todo:
- tests need to be written
- document in man pages and admin guide
- consider running
flux shutdown --forceinsystemctl stop flux - I think flux-accounting's use of the "job" test personality is going to be a problem now that
flux adminis removed so deal with that
I thought maybe a pause at this point to get feedback would be good. This had a bigger impact on tests than I was expecting, so maybe I took a wrong turn somewhere.
Fixed some stuff
- add
shutdownscript to dist manifest (whoops) - dropped commits that remove
flux admin cleanup-push - temporarily add backwards compat code for cleanup scripts (avoids breaking flux-coral2 and flux-accounting)
I dropped the actual changes to the shutdown process from this PR (including shutdown --force) leaving only the switch from cleanup-push to a shutdown script since that change alone is a pretty big one.
I think the script is going to be far more maintainable than the "cleanup scriptlets" pushed by rc1, and it opens the possibility of letting flux shutdown pass options to the script. For example, we may want to implement an option for shutdown at a future time that gives jobs more of a soft warning so they can checkpoint. Maybe the --force option should be reserved for when we don't automatically cancel running jobs at shutdown because they can be recovered.
A downside is that this impacts the tests that set rc1,rc3, including some in other projects. (Thus cleanup-push remains as an alias until the other projects can transition)
Codecov Report
Merging #5860 (587573b) into master (a7d5bb5) will decrease coverage by
0.05%. Report is 2 commits behind head on master. The diff coverage is57.89%.
Additional details and impacted files
@@ Coverage Diff @@
## master #5860 +/- ##
==========================================
- Coverage 83.31% 83.27% -0.05%
==========================================
Files 514 514
Lines 82801 82827 +26
==========================================
- Hits 68986 68974 -12
- Misses 13815 13853 +38
| Files | Coverage Δ | |
|---|---|---|
| src/common/libflux/conf.c | 85.40% <ø> (ø) |
|
| src/broker/state_machine.c | 80.91% <50.00%> (+0.73%) |
:arrow_up: |
| src/broker/runat.c | 73.61% <62.50%> (-6.05%) |
:arrow_down: |
| src/broker/broker.c | 77.92% <56.25%> (-0.27%) |
:arrow_down: |