flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

potential fallout when flux is updated before it is shut down

Open grondo opened this issue 3 months ago • 8 comments

During a recent Flux upgrade, new packages were installed while Flux was running. These packages removed rc3 as part of the modprobe transition, which later prevented an orderly shutdown when Flux was stopped.

In general, upgrading Flux before a proper shutdown seems like it has a high chance of causing issues. While this case was particularly bad due to rc3 changes (and there will be a similar issue for 0.78.0->0.79.0 transition), there could be other, more subtle issues on other upgrades. For instance, minor behavior or protocol changes that are expected to be consistent within a version, expectations of things that occurred during rc1 do not match rc3, etc.

For the rc1/rc3 consistency issues, one idea would be to store the current rc3 configuration in the KVS during rc1. This would protect the instance from rc3 changes occuring due to an upgrade.

I'm not sure how to solve the other issues in general though.

grondo avatar Oct 08 '25 17:10 grondo

I may be forgetting but isn't there a way to have an RPM stop a service before updating it and start again after?

garlick avatar Oct 08 '25 17:10 garlick

Ah, fedora has convenience scriptlets for this: https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#_systemd

garlick avatar Oct 08 '25 17:10 garlick

Good thought!

grondo avatar Oct 08 '25 18:10 grondo

Ok, the following preun scriptlet has been added to the flux-core RPM:

%preun
# Stop the flux service on both removal (via the systemd_preun() macro)
# and upgrade (via systemctl directly). This prevents errors when stopping
# flux after files have been replaced due to an upgrade or removed due
# to uninstall.
#
%systemd_preun flux.service

if [ $1 -eq 1 ]; then
    /usr/bin/systemctl stop flux.service >/dev/null 2>&1 || :
fi

This should stop and disable the flux service on an uninstall before the package is removed, and only stop the service on upgrade.

grondo avatar Oct 09 '25 00:10 grondo

Great! My only thought is if the stop is taking a long time due to processing a dump, I wonder if it would be useful to have the systemctl output?

garlick avatar Oct 09 '25 01:10 garlick

That's a good thought. The invocation above was modeled after the systemd provided RPM macros, e.g.:

%systemd_preun() \
if [ $1 -eq 0 ] ; then \
        # Package removal, not upgrade \
        systemctl --no-reload disable --now %{?*} &>/dev/null || : \
fi \
%{nil}

which strongly implies that no output should be generated. However, in this special case perhaps a different approach is needed.

Aside: TIL the bash redirect extension &>/dev/null is preferred over >/dev/null 2>&1, from bash(1)

       There are two formats for redirecting standard output and standard
       error:

              &>word
       and
              >&word

       Of the two forms, the first is preferred.  This is semantically
       equivalent to

              >word 2>&1

grondo avatar Oct 09 '25 03:10 grondo

The main reason to suppress errors and output from systemctl stop (which normally seems to be completely silent) is to avoid errors to the rpm or dnf output when the unit is already stopped or disabled. In light of that, how about this version:

%preun
# Stop the flux service on both removal and upgrade if active
if /usr/bin/systemctl is-active --quiet flux.service; then
    echo "Stopping Flux systemd unit due to upgrade/removal..."
    echo "For progress, check: systemctl status flux"
    /usr/bin/systemctl stop flux.service 
fi

This will emit a message to the console only if the Flux service is active. Since there's no output directly from systemctl stop, the message directs the admin to check systemctl status flux for progress. Additionally, an error in systemctl stop will now cause the %preun scriplet to fail and hopefully abort the upgrade operation so that recovery can be attempted.

grondo avatar Oct 09 '25 14:10 grondo

That sounds perfect.

garlick avatar Oct 09 '25 15:10 garlick