flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

flux-shutdown: need option to force fast shutdown

Open grondo opened this issue 10 months ago • 2 comments

There are several things that currently block an orderly flux shutdown, including slow epilogs that hold jobs in CLEANUP, bugs in jobtap plugins that leave jobs needing manual cleanup, etc.

It would be nice to have an option to bypass waiting for jobs in CLEANUP until Flux supports a restart with running/cleanup jobs.

Perhaps we could also add another shutdown script that automatically "fixes" any jobs in cleanup by forcing missing epilog-finish events.

grondo avatar Mar 29 '24 20:03 grondo

To review the current situation, in rc1 we have:

if test $RANK -eq 0; then
    if test -z "${FLUX_DISABLE_JOB_CLEANUP}"; then
        flux admin cleanup-push <<-EOT
        flux queue stop --quiet --all --nocheckpoint
        flux cancel --user=all --quiet --states RUN
        flux queue idle --quiet
        EOT
    fi
fi

When shudown begins, either due to a SIGTERM from systemctl stop flux or from flux shutdown, the first thing that happens is this scriptlet gets executed on rank 0. Only upon completion do we begin running rc3, starting with the TBON leaves, until eventually it runs on rank 0.

The flux queue idle command in the scriptlet will block until there are no jobs in RUN or CLEANUP state. Sadness results when jobs don't respond to the cancel request.

Here's a straw man proposal:

  • replace this funky "cleanup scriptlet" registered in rc1 with a full fledged script on a par with rc1 and rc3, e.g. /etc/flux/shutdown
  • add a way to extend that script as with the rc scripts, like /etc/flux/shutdown.d
  • if the coral2 plugins are introducing problems with epilog reference counts, perhaps that package could provide a shutdown.d scriptlet to fix, until a better solution is found?
  • provide a way to pass arguments from flux shutdown to /etc/flux/shutdown (and its sub-scripts)
  • make one of those options cause flux queue idle to be run with the --timeout option

Also: I think some relief may be had once we get #5818 worked out. In that proposal, jobs transition to INACTIVE before the housekeeping script completes. If housekeeping gets hung, it doesn't prevent the instance from stopping, and when it restarts, any still running housekeeping scripts are ignored. We probably need a way to reacquire any running housekeeping tasks on restart and avoid scheduling on those nodes, but the proposed behavior is probably a step in the right direction.

garlick avatar Mar 30 '24 18:03 garlick

if the coral2 plugins are introducing problems with epilog reference counts, perhaps that package could provide a shutdown.d scriptlet to fix, until a better solution is found?

FYI - I think this particular issue was fixed by flux-framework/flux-coral2#141

grondo avatar Apr 02 '24 14:04 grondo