Flatcar
Flatcar copied to clipboard
Support rebooting (e.g. for updates) via kexec
Current situation In some situations rebooting machines might take time, which adds up on large fleet of machines (e.g. bare-metal), extending the "maintenance" window.
Ideal future situation [ Please describe the future situation after the improvement was implemented ] Flatcar could make use of kexec for rebooting while updating to optimize reboot time of each machine.
The logic to alter the counter of the A/B partition and which one to choose, and to generate the new kernel command line arguments is implemented in GRUB and would have to be reimplemented in userspace. Other than that it's a question on how reliable kexec is with certain hardware.
I don't think this is a good use case of kexec
in general - regular production systems' state can become very complex (think driver states, DMA I/O queues, etc.). Merely kexec
ing into a new kernel without going through a reboot for resetting the underlying hardware / devices is likely to introduce inconsistent drivers/hardware state.
Hmm, I was using kexec
on Ubuntu for few years for kernel upgrades on bunch of bare-metal servers and I've never encountered any issues related to it.
I don't think this is a good use case of
kexec
in general - regular production systems' state can become very complex (think driver states, DMA I/O queues, etc.). Merelykexec
ing into a new kernel without going through a reboot for resetting the underlying hardware / devices is likely to introduce inconsistent drivers/hardware state.
I'm not sure this would be an issue. In the Equinix Metal production environment, we use kexec
for many customer OS installations across many hardware types with a wide array of components, firmware, and target kernels to initialize to.
So two use cases, one is the reboot and the other is after installation from PXE. Would it work to kexec into GRUB?
Actually it may help to work on https://github.com/flatcar-linux/Flatcar/issues/624 first, then we don't have to care about GRUB and kernel parameters so much
Some Flatcar and kexec references spotted in the wild: https://twitter.com/joonas_fi/status/1523751306489307136
Let's come up with a helper script that assembles the right kernel cmdline. The tasks are detecting first boot and OEM, evaluating/setting GPT attributes, and fetching the dmverity hash.
If desired we could even move some more logic into the initrd (e.g., first boot and OEM detection doesn't actually need to be done in GRUB and we could append the dmverity hash in a secondary initrd appended to the first cpio instead of doing our hack of overwriting some kernel bytes at an arch-specific offset as done now).
@pothos Hey, is anyone currently tackling this? If not, I'd be happy to take ownership of this issue if that works for everyone.