ostree
ostree copied to clipboard
add greenboot-like functionality integrated here
Having some discussions on https://pagure.io/fedora-iot/greenboot and I think we should actually fold this functionality into ostree proper. As is the hooks it makes into the "control loop" of OS upgrades are extremely complicated.
In particular I'd like ostree admin status
to have information about things like boot success etc.
I think we can take a new approach here:
- Write the core logic in Rust in https://github.com/ostreedev/ostree-rs-ext/ (maybe, or we could keep it in C here)
- UEFI only using BootNext ?
- A very minimal systemd unit which runs on the next boot (in fact we already have one, xref https://github.com/ostreedev/ostree/pull/2589 ) that marks a successful state
- Health checks based on e.g. systemd unit startup success can just be implemented by ordering against that success unit as a defined API; user logic chooses to then e.g. reboot back into the previous deployment
One part of this is probably that OSTree-based distros' default target should probably be systemd's boot-complete.target
going forward (instead of e.g. the currently mostly used multi-user.target
).
This was also touched upon in the recently held Image-based Linux summit, see: https://github.com/uapi-group/docs/blob/main/minutes/2022-10-05__Image-based-linux-summit.md#prior-art and more generally https://uapi-group.org/
This part of the boot specification might be relevant too to avoid creating two different mechanisms: https://github.com/uapi-group/specifications/blob/main/specs/boot_loader_specification.md#boot-counting
The boot counting mechanism described in the spec and used by sd-boot (i.e. putting the counter in the boot loader entry file) was deemed not implementable with grub, so greenboot implemented its own via a grub snippet and grub env vars. That's what's used by Fedora IoT / RHEL for Edge today. So we already have two different mechanisms unfortunately.
I am working on the re-write of the greenboot in rust, I am looking for options of how to differentiate a regular reboot vs reboot post upgrade/rollback. @cgwalters as a starter in ostree would like to understand how to
we should actually fold this functionality into ostree proper
And how different it is from existing greenboot.
I am looking for options of how to differentiate a regular reboot vs reboot post upgrade/rollback.
I'd say here we should use the ostree=
kernel argument as a source of truth. Today there's multiple things that log this that we could find in the journal:
[root@cosa-devsh ~]# journalctl --grep=ostree= -o cat |cat
Command line: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/vmlinuz-5.14.0-282.el9.x86_64 ignition.platform.id=qemu console=tty0 console=ttyS0,115200n8 ignition.firstboot ostree=/ostree/boot.1/rhcos/8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/0
Kernel command line: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/vmlinuz-5.14.0-282.el9.x86_64 ignition.platform.id=qemu console=tty0 console=ttyS0,115200n8 ignition.firstboot ostree=/ostree/boot.1/rhcos/8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/0
Unknown kernel command line parameters "BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/vmlinuz-5.14.0-282.el9.x86_64 ostree=/ostree/boot.1/rhcos/8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/0", will be passed to user space.
ostree=/ostree/boot.1/rhcos/8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/0
Using kernel command line parameters: ip=auto BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/vmlinuz-5.14.0-282.el9.x86_64 ignition.platform.id=qemu console=tty0 console=ttyS0,115200n8 ignition.firstboot ostree=/ostree/boot.1/rhcos/8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/0
[root@cosa-devsh ~]# journalctl --grep=ostree= | more
Mar 20 14:12:19 localhost kernel: Command line: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/vmlinuz-5.14.0-282.el9.x86_64 ignition.platform.id=qemu console=tty0 console=ttyS0,115200n8 ignition.firstboot ostree=/ostre
e/boot.1/rhcos/8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/0
Mar 20 14:12:19 localhost kernel: Kernel command line: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/vmlinuz-5.14.0-282.el9.x86_64 ignition.platform.id=qemu console=tty0 console=ttyS0,115200n8 ignition.firstboot ostree
=/ostree/boot.1/rhcos/8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/0
Mar 20 14:12:19 localhost kernel: Unknown kernel command line parameters "BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/vmlinuz-5.14.0-282.el9.x86_64 ostree=/ostree/boot.1/rhcos/8bb3298191b10a91e3d87a8f67872865cb6d42a8
ba72cbcfd865b42b77396813/0", will be passed to user space.
Mar 20 14:12:19 localhost kernel: ostree=/ostree/boot.1/rhcos/8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/0
Mar 20 14:12:19 localhost dracut-cmdline[413]: Using kernel command line parameters: ip=auto BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/vmlinuz-5.14.0-282.el9.x86_64 ignition.platform.id=qemu console=tty0 console
=ttyS0,115200n8 ignition.firstboot ostree=/ostree/boot.1/rhcos/8bb3298191b10a91e3d87a8f67872865cb6d42a8ba72cbcfd865b42b77396813/0
There's also journalctl -u ostree-prepare-root
which logs
Mar 20 14:12:22 localhost ostree-prepare-root[1132]: Resolved OSTree target to: /sysroot/ostree/deploy/rhcos/deploy/350495a02a76b33ab9436d5eeca7328417683292184d9e1829fb4268ff78c7cc.0
We could adjust that one to log this as structured data.
Now, not every system will have a persistent journal. I could imagine that we record a "last booted" field somewhere in ostree associated with a deployment. That'd be cheaper and more reliable to parse.
And how different it is from existing greenboot.
I think this is covered by the initial comment right?
Having some discussions on https://pagure.io/fedora-iot/greenboot and I think we should actually fold this functionality into ostree proper. As is the hooks it makes into the "control loop" of OS upgrades are extremely complicated.
In particular I'd like ostree admin status to have information about things like boot success etc.