deploy-rs icon indicating copy to clipboard operation
deploy-rs copied to clipboard

Is there an obvious way to ignore a specific activation failure (or not activate before next boot?)

Open nagisa opened this issue 3 years ago • 6 comments

I am trying to update my systems to a nixpkgs revision 40f26dfd652647880a6ec5b7d2f61acf47ff534f. I activate right now via nix run . -- .#hostname Alas when activating the following error occurs:

starting the following units: NetworkManager-wait-online.service, NetworkManager.service, audit.service, dnsmasq.service, kmod-static-nodes.service, network-local-commands.service, network-setup.service, nix-daemon.socket, nscd.service, prometheus-node-exporter.service, resolvconf.service, rtkit-daemon.service, systemd-modules-load.service, systemd-sysctl.service, systemd-timesyncd.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, tlp.service, udisks2.service, wpa_supplicant.service, zfs-mount.service, zfs-scrub.timer, zfs-share.service, zfs-snapshot-daily.timer, zfs-snapshot-frequent.timer, zfs-snapshot-hourly.timer, zfs-snapshot-monthly.timer, zfs-snapshot-weekly.timer, zfs-zed.service, zpool-trim.timer, zram-reloader.service
the following new units were started: NetworkManager-dispatcher.service, systemd-hostnamed.service, systemd-vconsole-setup.service
warning: the following units failed: mount-pstore.service

● mount-pstore.service
     Loaded: loaded (/nix/store/0kcxap1676glqd8yy4pji5byk19skiqg-unit-mount-pstore.service/mount-pstore.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Thu 2021-05-20 15:22:35 EEST; 7s ago
    Process: 76784 ExecStart=/nix/store/63zfbgkcai37h864rn1nwjjqhwwq2v49-util-linux-2.36.2-bin/bin/mount -t pstore -o nosuid,noexec,nodev pstore /sys/fs/pstore (code=exited, status=32)
   Main PID: 76784 (code=exited, status=32)
         IP: 0B in, 0B out
        CPU: 1ms

May 20 15:22:35 haibox systemd[1]: Starting mount-pstore.service...
May 20 15:22:35 haibox mount[76784]: mount: /sys/fs/pstore: pstore already mounted on /sys/fs/pstore.
May 20 15:22:35 haibox systemd[1]: mount-pstore.service: Main process exited, code=exited, status=32/n/a
May 20 15:22:35 haibox systemd[1]: mount-pstore.service: Failed with result 'exit-code'.
May 20 15:22:35 haibox systemd[1]: Failed to start mount-pstore.service.
⭐ ⚠️ [activate] [WARN] De-activating due to error
switching from generation 132 to 131
⭐ ⚠️ [activate] [WARN] Removing generation by ID 132
removing generation 132

This activation error seems non-critical and I think things would continue working fine even if it was ignored, or if the generation activation was delayed to the next boot. Alas, there isn't an obvious way to do either, and so I'm stuck with no idea on how to proceed in a situation like this.

What's the expected way to deal with a situation like this?

nagisa avatar May 20 '21 12:05 nagisa

You can always run deploy again but with --auto-rollback false, which will make it ignore any activation errors but still rollback if there is any connectivity error detected (unless you also add --magic-rollback false). Though depending on what is being activated this error might still be interrupting the activation, but that's outside the control of deploy-rs, and in this case is up to NixOS.

notgne2 avatar May 21 '21 22:05 notgne2

I think an option is needed to not switch configuration, only enable it for next boot. I am getting activation failures (and rollback failures too) when I change something that can't be modified while the system is running (changing kernels, changing the root filesystem, ...). An option that would enable it on next boot, and maybe even reboots automatically, would be great.

12Boti avatar May 25 '21 18:05 12Boti

That sounds like an upstream NixOS issue if it's attempting to perform the impossible upon activation and failing, and --auto-rollback false should work in your case too (it will fail, but without rolling back the generation should persist and thus activate on next boot) but I think it would be nice to support it anyway.

Currently the NixOS activation profile will run $PROFILE/bin/switch-to-configuration switch which of course tries to switch live, but I guess we could implement a flag similar to --dry-activate which will set an environment variable to indicate if it should run switch or boot (or the equivalent for other activatable profiles)

notgne2 avatar May 26 '21 06:05 notgne2

This would probably remove half my usage of --auto-rollback false, usually over network unit rebuild and similar. Having a flag for boot sounds like a good idea.

mkaito avatar Jul 08 '21 15:07 mkaito

I second this. On 90% of my updates I have to use --auto-rollback false --magick-rollback false even if I plan to reboot anyway.

kuetemeier avatar Sep 05 '21 11:09 kuetemeier

Can we have an interactive prompt that asks the user whether to rollback or not on activation failure? It'd be also be helpful to be given an opportunity to run journalctl -xeu and the like to inspect the new state before deciding to rollback (or not).

srid avatar Apr 12 '23 15:04 srid