Support `systemctl kexec` reboots
I know you are playing with the idea of a systemd-boot implementation of the boot chain instead of grub, and as such I am filing this feature request to help inform you on design!
What I'd like: We are running on bare metal, and as such there is a very lengthy firmware boot up time (several minutes). It would be beneficial if Bottlerocket supported a kexec reboot into the new root.
This would be a bit of a niche use-case, as systemctl kexec is not widely used, and it should be known that kexec reboots currently don't understand UKIs (which generally would be a great choice), but there has been a patch submitted GH, kernel.org mailing list
Any alternatives you've considered: Suffer through the painfully long UEFI boot times.
Thanks for bringing this up @mikn - it is a cool feature to consider! Is this with the intention to boot up into an updated Kernel on metal?
Just wanted to ensure that understanding as it seems something like systemd's soft-reboot wouldn't give you what you are looking for, right?
Correct! Essentially I would like to do an A/B partition transition, so a full system update.
On Tue, 27 May 2025, 18:42 Kyle Sessions, @.***> wrote:
KCSesh left a comment (bottlerocket-os/bottlerocket#4543) https://github.com/bottlerocket-os/bottlerocket/issues/4543#issuecomment-2913259598
Thanks for bringing this up @mikn https://github.com/mikn - it is a cool feature to consider! Is this with the intention to boot up into an updated Kernel on metal?
Just wanted to ensure that understanding as it seems something like systemd's soft-reboot https://www.freedesktop.org/software/systemd/man/latest/systemd-soft-reboot.service.html wouldn't give you what you are looking for, right?
— Reply to this email directly, view it on GitHub https://github.com/bottlerocket-os/bottlerocket/issues/4543#issuecomment-2913259598, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABAI6AA4HECMGOEVHP3RX33ASIW7AVCNFSM6AAAAAB576PSDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMJTGI2TSNJZHA . You are receiving this because you were mentioned.Message ID: @.***>
Do you need anything beyond enabling kexec in the kernel? Bottlerocket does enable the syscall today in all three kernels.
It may require you to pursue or carry the kernel patch I linked to support kexec:ing into a signed UKI, if that's how you plan to ensure secure boot integrity of the kernel command line. I would also like it to be configurable to use "kexec" reboot instead of "systemctl reboot" when BRUPOP triggers a reboot, I guess this could be either a user data setting or a flag to the apiclient.
I don't think systemd-boot supports any other way than specifically UKIs to secure the kernel command line, which may also be relevant to keep in mind when designing this.
On Tue, 27 May 2025, 23:57 Martin Harriman, @.***> wrote:
larvacea left a comment (bottlerocket-os/bottlerocket#4543) https://github.com/bottlerocket-os/bottlerocket/issues/4543#issuecomment-2914253896
Do you need anything beyond enabling kexec in the kernel? Bottlerocket does enable the syscall today in all three kernels.
— Reply to this email directly, view it on GitHub https://github.com/bottlerocket-os/bottlerocket/issues/4543#issuecomment-2914253896, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABAI6HVLYDG6I4EL4PRJS33ATNT7AVCNFSM6AAAAAB576PSDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMJUGI2TGOBZGY . You are receiving this because you were mentioned.Message ID: @.***>
kexec relies on kernel drivers to implement a PCI device shutdown hook correctly in order to work reliably. As one example, in-flight DMA requests need to complete before the kexec, or else they can corrupt memory in the new kernel.
In my experience, working mostly within a bounded set of server-class hardware, driver-related issues are uncommon but not unheard of. I wouldn't advise using kexec with uncontrolled hardware where it hasn't specifically been qualified.
I also regard kexec as largely incompatible with Secure Boot. Much ado about SBAT goes into many of the factors, but it's very difficult to reason about the security of the kernel after it starts running. Loading an unsigned kernel module, mounting an unverified filesystem, or running a userspace process with full capabilities could all potentially compromise the kernel.
Whether or not kexec supports UKIs, or zboot images, it can't extend the chain of trust from Secure Boot to a new Linux kernel and userspace, unless perhaps invoked very early, before any of those other events could have happened.
You would need a different threat model, like "Secure Kexec", that established how the new Linux environment could trust the old Linux environment had not been compromised, and that it was free of implementation defects like the driver issues that might cause damage. TPM-backed secrets might or might not be available under the weaker guarantees of this threat model.
Thanks @bcressey for spending the time to share your analysis and insight here. You raise important concerns about the fundamental trust model with kexec that I hadn't really considered before.
I understand your point that kexec inherently requires trusting the running kernel to load the next one, which creates a different security model than a full firmware reboot. Even with kexec_load_file() checking signatures, a compromised kernel could subvert this process.
Regarding the TPM/attestation issues - you're right that without a firmware reset, PCRs would just be extended rather than reset, making it impossible to distinguish a clean boot from a kexec. This breaks both remote attestation and TPM-sealed secrets.
I have read the "Much ado about SBAT" article before and it is very interesting, but you put it in a new light for me in the context of maintaining security guarantees in a running system.
Given Bottlerocket's security-first approach and the fundamental differences in trust model between firmware reboot and kexec, I can see why you'd consider them incompatible. The faster boot times don't seem worth compromising the security guarantees that Bottlerocket provides.
I am fine with closing this issue as a "Won't Fix" given this conversation.
There are some things that I think could still make it interesting, such as;
- in commodity hardware, the firmware is more often the weakest link than the running kernel - the security of the firmware is often not great
- running the kernel in
integritymode and with a limited set of file systems makes it very difficult to compromise the running kernel
But, we are looking at running some relatively niche filesystems in our main kernel, so I do actually think we would like to do a firmware anchored reboot of our nodes relatively often until we can do a full and proper audit.