bootc
bootc copied to clipboard
Add `bootc apply-live`
This is probably mainly draining the logic from rpm-ostree into ostree-ext, then re-using it here.
But...a whole lot of suddenly OS-specific issues come to the fore. For example, should we try to distinguish between "new content" and "changes"?
This one also relates to https://github.com/containers/bootc/issues/165
I know at least some people have also been asking for an "apply live by default" mode. Note that there's a super tricky detail of that in that if there's kernel changes in the new root, we must in general do something like keep /usr/lib/modules/$kver mounted/copied into the running root.
This one also relates to #165
I know at least some people have also been asking for an "apply live by default" mode. Note that there's a super tricky detail of that in that if there's kernel changes in the new root, we must in general do something like keep
/usr/lib/modules/$kvermounted/copied into the running root.
Could we bind mount /usr/lib/modules somewhere else like /run/booted-kernel-modules during the initial boot, and then have the kernel also look for modules there? Then we don't have to mess with trying to union the new and old modules dirs?
Could we use the same logic as the DNF needs-restarting plugin? This gives hints whether processes, services or the whole system needs a restart to account for the changes. And this would keep us close to DNF.
It might also be nice to have a --dry-run option to know before applying a change whether it will require a process / service / system restart. It would also be interesting to have the ability to use the same logic to compare two images out of band, e.g. in a CI/CD pipeline.
Could we use the same logic as the DNF needs-restarting plugin? This gives hints whether processes, services or the whole system needs a restart to account for the changes. And this would keep us close to DNF.
It might also be nice to have a
--dry-runoption to know before applying a change whether it will require a process / service / system restart. It would also be interesting to have the ability to use the same logic to compare two images out of band, e.g. in a CI/CD pipeline.
the intent behind apply-live is to avoid reboots, where a kernel update is actually required a more natural fit would be to have a kexec flow to switch directly to the new kernel or defer that until the next actual reboot is requested.
Having bootc apply-live ever trigger a whole system restart would make it unusable for some use-cases. i.e. performing software updates when the workload cannot be stopped or moved to a different host. for example, if you have vms or other workloads on the systems in the context of a cloud computing environment.
A normal user may be okay with downstream, but using this in a data centre context would be very problematic. It would be necessary to live migrate all workloads from a given host to a different one just to update the version of Ping or some other package on the host because of a cve.
so the ability to live application without reboot is required for CVE patching and other use-cases.
I hope this issue is prioritized at some point, as I still consider it very important for bootc to be adopted in data centres.
Just to cross-reference this more explicitly: in OCP we're going to be implementing something similar to this RFE to be able to boot from a disk image which doesn't have e.g. the kubelet and overlay the target node image, which does have it. One success criteria for this RFE should ideally be that we can simplify that logic to use bootc apply-live instead. For example, it should ideally work in live systems as well.
I know that kexec support doesn't get this issue to completion, but does the completion of kexec support (https://github.com/ostreedev/ostree/issues/435 solved by https://github.com/ostreedev/ostree/pull/3362) in ostree get bootc closer to at least supporting kexec?
I was hoping to figure out a series of commands that would allow me to leverage this without a code change but haven't been able to get around refspec issues.
EDIT: Looking deeper into this seems like it would be a pretty small modification to lib/src/cli.rs and lib/src/deploy.rs might make this possible. However, when I tried to get the ostree bindings updated I ran into issues with shadowing. It also seems like the ostree kexec contributor @mstrodl is still one step ahead of me 😆 as they have also been trying to get the ostree bindings updated and have run into the same issue https://github.com/gtk-rs/gir/pull/1627 and put up a fix.
Yeah, last time I tried to update the ostree rust bindings I ran into various issues...it needs some love.
That said, the kexec logic is not really complex and we could also reimplement it in Rust directly here.
@cgwalters if you're open to a contribution adding separate kexec logic directly into bootc I'll see if I can find time in the next couple weeks to give it a try. We'd love to stop waiting for bare metal DDR training/firmware initialization every time we run bootc upgrade.
Yeah, when you are ready to work on it feel free to ping in this issue or on the matrix chat, happy to help
Yeah, last time I tried to update the ostree rust bindings I ran into various issues...it needs some love.
That said, the kexec logic is not really complex and we could also reimplement it in Rust directly here.
My patch I posted there should fix it enough to let you regenerate the bindings
PR merged. I have a tree on my machine with new bindings, I'll see if I can get them out today.
Good news! Updated ostree bindings are available (v0.20.0) thanks to @cgwalters :) https://github.com/ostreedev/ostree/pull/3376 https://github.com/ostreedev/ostree/pull/3378
Yep and https://github.com/containers/bootc/pull/1069 started using them here
One thing related to this we've been talking about on the MCO side is the idea of lowering Node Disruption Policies down into RHEL, e.g. in bootc. This would work in tandem with apply-live I think, where e.g. bootc apply-live would by default only apply live changes if it respects the policies, or error out.
And/or maybe have a bootc upgrade --apply-or-reboot which would either apply live changes if all changes respect node disruption policies or reboot the machine.
OTOH I can also see this NDPs integrate with #22 instead, since /etc files is probably 95% of what you'd use it for and it's much easier to live apply config changes since those are already distinct from the host image and actually overlaid live.
But on the other other hand, having the concept of NDPs be generic enough so it applies to both OS content and config overlays seems like it'd be useful.