deploy-rs
deploy-rs copied to clipboard
Is there an obvious way to ignore a specific activation failure (or not activate before next boot?)
I am trying to update my systems to a nixpkgs revision 40f26dfd652647880a6ec5b7d2f61acf47ff534f
. I activate right now via nix run . -- .#hostname
Alas when activating the following error occurs:
starting the following units: NetworkManager-wait-online.service, NetworkManager.service, audit.service, dnsmasq.service, kmod-static-nodes.service, network-local-commands.service, network-setup.service, nix-daemon.socket, nscd.service, prometheus-node-exporter.service, resolvconf.service, rtkit-daemon.service, systemd-modules-load.service, systemd-sysctl.service, systemd-timesyncd.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, tlp.service, udisks2.service, wpa_supplicant.service, zfs-mount.service, zfs-scrub.timer, zfs-share.service, zfs-snapshot-daily.timer, zfs-snapshot-frequent.timer, zfs-snapshot-hourly.timer, zfs-snapshot-monthly.timer, zfs-snapshot-weekly.timer, zfs-zed.service, zpool-trim.timer, zram-reloader.service
the following new units were started: NetworkManager-dispatcher.service, systemd-hostnamed.service, systemd-vconsole-setup.service
warning: the following units failed: mount-pstore.service
● mount-pstore.service
Loaded: loaded (/nix/store/0kcxap1676glqd8yy4pji5byk19skiqg-unit-mount-pstore.service/mount-pstore.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2021-05-20 15:22:35 EEST; 7s ago
Process: 76784 ExecStart=/nix/store/63zfbgkcai37h864rn1nwjjqhwwq2v49-util-linux-2.36.2-bin/bin/mount -t pstore -o nosuid,noexec,nodev pstore /sys/fs/pstore (code=exited, status=32)
Main PID: 76784 (code=exited, status=32)
IP: 0B in, 0B out
CPU: 1ms
May 20 15:22:35 haibox systemd[1]: Starting mount-pstore.service...
May 20 15:22:35 haibox mount[76784]: mount: /sys/fs/pstore: pstore already mounted on /sys/fs/pstore.
May 20 15:22:35 haibox systemd[1]: mount-pstore.service: Main process exited, code=exited, status=32/n/a
May 20 15:22:35 haibox systemd[1]: mount-pstore.service: Failed with result 'exit-code'.
May 20 15:22:35 haibox systemd[1]: Failed to start mount-pstore.service.
⭐ ⚠️ [activate] [WARN] De-activating due to error
switching from generation 132 to 131
⭐ ⚠️ [activate] [WARN] Removing generation by ID 132
removing generation 132
This activation error seems non-critical and I think things would continue working fine even if it was ignored, or if the generation activation was delayed to the next boot. Alas, there isn't an obvious way to do either, and so I'm stuck with no idea on how to proceed in a situation like this.
What's the expected way to deal with a situation like this?
You can always run deploy again but with --auto-rollback false
, which will make it ignore any activation errors but still rollback if there is any connectivity error detected (unless you also add --magic-rollback false
). Though depending on what is being activated this error might still be interrupting the activation, but that's outside the control of deploy-rs, and in this case is up to NixOS.
I think an option is needed to not switch configuration, only enable it for next boot. I am getting activation failures (and rollback failures too) when I change something that can't be modified while the system is running (changing kernels, changing the root filesystem, ...). An option that would enable it on next boot, and maybe even reboots automatically, would be great.
That sounds like an upstream NixOS issue if it's attempting to perform the impossible upon activation and failing, and --auto-rollback false
should work in your case too (it will fail, but without rolling back the generation should persist and thus activate on next boot) but I think it would be nice to support it anyway.
Currently the NixOS activation profile will run $PROFILE/bin/switch-to-configuration switch
which of course tries to switch live, but I guess we could implement a flag similar to --dry-activate which will set an environment variable to indicate if it should run switch
or boot
(or the equivalent for other activatable profiles)
This would probably remove half my usage of --auto-rollback false
, usually over network unit rebuild and similar. Having a flag for boot
sounds like a good idea.
I second this. On 90% of my updates I have to use --auto-rollback false --magick-rollback false
even if I plan to reboot anyway.
Can we have an interactive prompt that asks the user whether to rollback or not on activation failure? It'd be also be helpful to be given an opportunity to run journalctl -xeu
and the like to inspect the new state before deciding to rollback (or not).