bootc
bootc copied to clipboard
bootc composefs-native backend
The composefs-rs project is a Rust implementation of composefs that is capable of generating composefs images from container images.
We should integrate it in bootc as an alternative to the ostree backend. This would help make progress on phasing out ostree, UKI support and unified storage:
- https://github.com/bootc-dev/bootc/issues/806
- https://github.com/bootc-dev/bootc/issues/20
To be able to do that, we need to make bootc capable of handling both repository formats and have it handle the transition from ostree to pure composefs.
A potential layout for this is discussed in https://github.com/containers/composefs-rs/issues/38.
Here are suggested steps for creating a first proof of concept implementation:
- Add an option to bootc switch
bootc switch --composefs quay.io://foo:bar- bootc will import the container image using the composefs-rs library in a dedicated composefs repo
- bootc will set up the repo as needed
- bootc will create a new "deployment" for this image
- Do the three way merge for /etc, comparing previous image, new image, current changes
- Or use overlayfs instead to do that for /etc
- Setup /var so that it's shared with ostree deployments
- bootc will setup the new deployment for the next boot
- UKI case:
- GRUB: Generate GRUB config snippet to boot the UKIs in order
- systemd-boot: Install the UKI in /boot/efi/EFI/Linux (order handling to be confirmed)
- Non-UKI case:
- GRUB: Install the kernel & initrd in /boot and setup the BLS config
- systemd-boot: Install the kernel & initrd in /boot/efi and setup the BLS config
- UKI case:
Tracker issues:
- [ ] @Johan-Liebert1 GC
- [ ] @Johan-Liebert1 Making update/switch idempotent (in-progress)
- [x] @Johan-Liebert1 rebase and merge branch!
- [x] @cgwalters testing improvements, land https://github.com/bootc-dev/bootc/pull/1607 and also https://github.com/cgwalters/bootc-kit work
- [x] @cgwalters factor out issue for what gates merging the composefs branch to main
- [ ] @cgwalters test(s); as with the rest of our tests we need some coverage via TMT but can also test with other frameworks
- [x] Fix
default_tafter install - https://github.com/containers/composefs-rs/pull/169 - [x] @cgwalters fix lost commits since https://github.com/bootc-dev/bootc/commit/c89efab8ba14683682093f6f730a696b05491783 ( https://github.com/bootc-dev/bootc/pull/1502 )
- [x] @cgwalters Refactor branch to use composefs repo from storage
- [x] Related to above (sha256 vs sha512) relates to things like https://github.com/containers/composefs-rs/issues/107 - DECISION sha512
- [ ] Enable more of the tmt tests with composefs backend (this will block on either
copy-to-storageworking or doing more builds on the host) @ckyrouac - [ ] @cgwalters Add
bootc storage init-composefsto initialize a repo from existing system - [x] Service similar to
ostree-finalize, saycomposefs-native-finalizewhich will be responsible for atomically swapping staged and current boot entries among other things. - [x] GRUB+UKI: If we want this right now we need to generate GRUB fragment because it doesn't do BLS configs properly
- [x] @Johan-Liebert1 add tests for
/etcmerge: https://github.com/bootc-dev/bootc/issues/1469 ( https://github.com/bootc-dev/bootc/pull/1485 ) - [x] @Johan-Liebert1 Copy initramfs code (mount) from composefs-rs into bootc initramfs
- [ ] CI with other distro e.g. Debian
- [x] @jeckersb kernel cmdline cleanups
- [x] @cgwalters Detect UKIs and automatically enable composefs backend
- [ ] Storing metadata (manifest+config); ostree backend serializes into synthetic commit. Note this is not covered by the UKI. This relates to https://github.com/bootc-dev/bootc/issues/20 - then it'd always be saved how container-libs saves it. Another option here is for composefs-rs to have first class support for storing "adjacent" metadata attached to an image.
- [ ] @jeckersb https://github.com/bootc-dev/bootc/issues/1498
- [ ] @jeckersb Document + examples + CI setting up UKIs from current fedora/centos bootc (ref https://github.com/containers/composefs-rs/pull/143 )
Install UX
Right now we're adding a new --composefs-native.
Proposal: Sealed images default to requiring sealed setup
i.e. secure boot w/fsverity on target or erroring by default
(less agreement on: can opt in to degrading w/ bootc install <image> --disable-sealing )
TODO: Create a spec for detecting sealed images
- Detect via looking at layer structure
- Detect by parsing UKI
Proposal: Automatically detect in initramfs if Secure Boot is disabled
?
Proposal from @travier: Remove verity optional composefs flag
Some confusion about use cases
I added the area/composefs label to this, there's a few related things there, but especially https://github.com/bootc-dev/bootc/pull/935 will step towards this from the other direction and I'd like to try to land that relatively early on.
progress on phasing out ostree
And yes while this is definitely a longer term plan there's A LOT of stuff there; for example, we document configuring ostree-prepare-root today, and so we'll need to continue honoring configs for /usr/lib/ostree/prepare-root.conf into the forseeable future.
https://bootc-dev.github.io/bootc/filesystem.html#enabling-state-overlays is another one.
~BTW https://github.com/bootc-dev/bootc/pull/935 is updated and working now at least.~
#935 merged. Still TODO:
- Support for upgrades (we should check post fetching, before reboot, check if new image enables fsverity, if so do it before rebooting)
bootc internals enable-fsverityor so
bootc switch --composefs quay.io://foo:bar
Hmm, that raises the bar quite high for initial experimentation here. I think it'd probably be a lot easier to start with bootc install actually because trying to make it part of switch implies that how etc and var work is fully compatible.
As far as how it's configured, we can definitely do something like bootc install --experimental-composefs-backend or so, but actually the other option is to just default to this for the "bare UKI" in the image (no bootloader); which rolls a bit into the question of configuring the bootloader. Maybe easiest for now is to automatically do this for images that have a UKI and don't include bootupd.
Sub-thread on the build side, we need to bikeshed how we expose this part:
https://github.com/containers/composefs-rs/commit/dfdb1b1ee55352deed304f46c0af75a9def4efa5#diff-5731dcec97e8f5f570af734466fee700490739eda2a59b738e4417c7ee31c9ccR32
As far as how it's configured, we can definitely do something like
bootc install --experimental-composefs-backend
👍🏻
but actually the other option is to just default to this for the "bare UKI" in the image (no bootloader); which rolls a bit into the question of configuring the bootloader. Maybe easiest for now is to automatically do this for images that have a UKI and don't include bootupd.
I don't think we should have images without bootupd. Ideally we would have a single image with the UKI and degrade it into a kernel + initramfs + command line setup on demand in bootc/composefs-rs when requested for setups that don't support UKI or fs-verity. It's also likely that we will ship both the kernel & UKI in the image at the beginning.
Chatting with Pragyan, one option that may be simplest is to focus on the direct-UKI case w/o a bootloader. That would help prove out the installation path without involving the complexities of bootloaders. But I definitely see that most use cases would want one (especially one with boot counting).
I don't think the use case without a bootloader is simpler or smaller in terms of code. No bootloader means either doing EFI variable management or a special ESP file handling logic. Neither of those will be reliable or easy to test compared to the current approach.
OK sorry, I don't have a really strong opinion and am fine with either path. Whatever is easier to demonstrate progress if again fine by me.
One tangential thing, I was rereading https://0pointer.net/blog/fitting-everything-together.html and thinking about the architecture here, and it does seem clear to me that in the general case we need to also encourage the dm-crypt+(dm-integrity|btrfs-with-full-checksum) for the stateful root. This definitely has implications on how we think of the installation path (which is necessarily distinct from the "dd or boot raw disk image" flow).
In case this is interesting for you as well:
For automotive, the plan is to drop grub for booting in favor of https://gitlab.com/CentOS/automotive/src/ukiboot which is a UEFI based implementation of an android-boot lookalike using UKIs (systemd-uikify). This basically has two partitions (ukiboot_a and ukiboot_b) which have raw dumped UKIs into it, and a small metadata partition (ukibootctl) that has boot counter and active boot slot information. A super tiny EFI app in the ESP picks the right partition and loads the UKI from it.
It looks like this is re-implementing a lot of the features already implemented in systemd-boot. Could you clarify why you decided to not use systemd-boot?
@travier I don't really think it is re-implementing much of systemd-boot actually. We are reusing the UKI stub from systemd-boot, and the reimplemented part is only 514 lines of uefi C code.
However, I understand the question, and I'll try to give our reasoning
First of all, we already have hardware that we must support that is using android boot. So, no matter what we do on UEFI systems we must already support that kind of layout, and include it is our tooling, testing, documentation, etc. Having a story on UEFI that is very similar in behavior, implementation, ostree integrating, testing, etc is very beneficial to the project in general.
Secondly, for our usecase (fixed hardware, no console access) the android boot mechanism is actually a great fit. It has an A/B boot system that is very reliable (a non-filesystem, atomic A/B switching mechanism w/ boot counters and automatic rollback) and simple. Whereas systemd-boot has lots of stuff we don't need: complex file formats, fllesystem use, two different boot file formats, multiple boot targets, interactive ui, etc.
I understand the need for systemd-boot, and I would never have ukiboot installed on a laptop or generic PC. However, for some embedded piece of hardware, ukiboot just is a better fit.
Thanks for the details!
Status update on https://github.com/bootc-dev/bootc/pull/1314 (with examples in https://github.com/containers/composefs-rs/pull/143):
- Added an option to
bootc install-to-diskuse the composefs native backendbootc install-to-disk --composefs-native ...- slightly tweaked disk setup (using DPS for the root disk UUID)
- bootc imports the container image using the composefs-rs library in a dedicated composefs repo
- bootc creates a new "deployment" for this image and sets up a state directory:
- using overlayfs for
/etc - using a normal bind mount for
/var
- using overlayfs for
- bootc will setup the new deployment for the next boot
- Non-UKI case (called BLS):
- GRUB: Install the kernel & initrd in
/sysroot/bootand setup BLS config
- GRUB: Install the kernel & initrd in
- UKI case:
- GRUB: Install the UKI in
/sysroot/boot/EFI/Linux - GRUB: Generate a GRUB config snippet to boot the UKI and write it to
/sysroot/boot/grub2/user.cfg
- GRUB: Install the UKI in
- Non-UKI case (called BLS):
Next steps
Suggested order of things to implement/fix to be able to consider this ready to merge in bootc:
- [x]
bootc status(only current deployments, no staged ones yet) - [x] add a an option for skopeo/podman unshare issue: https://github.com/containers/composefs-rs/pull/117
- [x] Install UKI in the ESP instead of
/sysroot/boot - [x]
bootc update/bootc switch(from composefs-native only) - [x]
bootc rollbacksupport for UKI & BLS setups - [ ]
bootc install-to-filesystem - [ ] Increase size of the ESP to 1GB for the UKI installation case:
- https://fedoraproject.org/wiki/Changes/BiggerESP
- https://bugzilla.redhat.com/show_bug.cgi?id=2208181
- https://github.com/rhinstaller/anaconda/pull/5081
- [ ] three way merge for
/etcinstead of overayfs - [ ] include
systemd-gpt-auto-generatorin the initrd to not have to set the root UUID in the kernel command line - [ ] cleanup error handling in composefs-rs: https://github.com/containers/composefs-rs/issues/110
Future work
- Support UKI only images (no vmlinuz & initrd.img) and degrade those to a BLS setup
- Support
bootc switchfrom ostree storage
bootupd
- Support for systemd-boot in bootupd (issue to be filled)
Notes on composefs-native deployment staging & bootc status:
- In the UKI case, we should install the UKI in
EFI/Linux/foo.efi.staged(see: https://uapi-group.org/specifications/specs/boot_loader_specification/#type-2-efi-unified-kernel-images) so that it is ignored by systemd-boot until we're ready to make it available for booting.bootccan thus look at the composefs-native EROFS images available in the repo to list the "deployments" and look at the UKIs inEFI/Linuxto figure out their state (ready or staged). - In the BLS case, we should install the vmlinuz & initrd in
/boot/composefs/<composefs-hash>/...and use/boot/loader/entries.stagedto write the new boot entries. Similarly,bootcwill have to look at the BLS entries to figure out if a deployment is staged or not. - In both cases, we need a new (or adapted)
finalized-stageservice unit that runs on shutdown and does the rename of the UKI or the entries folder. - This finalize stage will also do the
/etc3-way merge as well before renaming the UKI/entries folder.
Commenting here so this is tracked.
Currently with https://github.com/bootc-dev/bootc/pull/1314, I'm always testing with SeLinux set to permissive mode, because of the following error.
Starting initrd-switch-root.service - Switch Root...
[ 3.585540] systemd-journald[299]: Received SIGTERM from PID 1 (systemd).
[ 3.683826] audit: type=1404 audit(1753180342.819:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 enabled=1 old-enabled=1 lsm=selinux res=1
[ 3.732071] SELinux: Permission nlmsg in class netlink_route_socket not defined in policy.
[ 3.733155] SELinux: Permission nlmsg in class netlink_tcpdiag_socket not defined in policy.
[ 3.734466] SELinux: Permission nlmsg in class netlink_xfrm_socket not defined in policy.
[ 3.735515] SELinux: Permission nlmsg in class netlink_audit_socket not defined in policy.
[ 3.736582] SELinux: the above unknown classes and permissions will be allowed
[ 3.741403] SELinux: policy capability network_peer_controls=1
[ 3.742329] SELinux: policy capability open_perms=1
[ 3.742992] SELinux: policy capability extended_socket_class=1
[ 3.743730] SELinux: policy capability always_check_network=0
[ 3.744455] SELinux: policy capability cgroup_seclabel=1
[ 3.745125] SELinux: policy capability nnp_nosuid_transition=1
[ 3.745933] SELinux: policy capability genfs_seclabel_symlinks=1
[ 3.746710] SELinux: policy capability ioctl_skip_cloexec=0
[ 3.747430] SELinux: policy capability userspace_initial_context=0
[ 3.748213] SELinux: policy capability netlink_xperm=0
[ 3.802166] audit: type=1403 audit(1753180342.937:3): auid=4294967295 ses=4294967295 lsm=selinux res=1
[ 3.803770] systemd[1]: Successfully loaded SELinux policy in 120.948ms.
[ 3.806197] systemd[1]: Failed to initialize SELinux labeling handle: Permission denied
[ 3.806206] audit: type=1400 audit(1753180342.942:4): avc: denied { read } for pid=1 comm="systemd" name="file_contexts.subs_dist" dev="overlay" ino=15709 scontext=system_u:system_r:init_t:s
0 tcontext=system_u:object_r:default_t:s0 tclass=file permissive=0
[!!!!!!] Failed to initialize MAC support.
[ 3.812042] systemd[1]: Freezing execution.
I'm looking at a rebase of the branch, it's a conflict fest as expected. There's two things here:
continue to split out independent PRs
[ ] https://github.com/bootc-dev/bootc/pull/1480
squash commits
I think most of what's there on the branch could be reasonably squashed actually offhand
re: https://github.com/bootc-dev/bootc/pull/1507#pullrequestreview-3117121484
Not sure it makes sense to keep the grub feature, since trying to build with it disabled doesn't work and trying to gate everything behind feature flags (with help from claude) pretty quickly spirals out of control.
- [ ] Do https://github.com/bootc-dev/bootc/pull/1541 only for the systemd-boot case (BLS configs & UKI) as it will not work for GRUB right now
- Add logic to bootc install to select bootloader to install (will be passed to bootupd install)
- Add bootloader type to origin file to let bootc known what bootloader config to write and where
Currently planning to merge the branch during community meeting tomorrow! Immense amount to do of course as followups.
Also came up in the meeting I'd like to take this direction
diff --git i/crates/lib/src/cli.rs w/crates/lib/src/cli.rs
index c955be7c..d1b50c53 100644
--- i/crates/lib/src/cli.rs
+++ w/crates/lib/src/cli.rs
@@ -39,6 +39,7 @@ use crate::lints;
use crate::progress_jsonl::{ProgressWriter, RawProgressFd};
use crate::spec::Host;
use crate::spec::ImageReference;
+use crate::store::Storage;
use crate::utils::sigpolicy_from_opt;
/// Shared progress options
@@ -923,8 +924,7 @@ fn prepare_for_write() -> Result<()> {
/// Implementation of the `bootc upgrade` CLI command.
#[context("Upgrading")]
-async fn upgrade(opts: UpgradeOpts) -> Result<()> {
- let sysroot = &get_storage().await?;
+async fn upgrade(opts: UpgradeOpts, storage: &Storage) -> Result<()> {
let ostree = sysroot.get_ostree()?;
let repo = &ostree.repo();
let (booted_deployment, _deployments, host) = crate::status::get_status_require_booted(ostree)?;
@@ -1303,15 +1303,8 @@ async fn run_from_opt(opt: Opt) -> Result<()> {
let root = &Dir::open_ambient_dir("/", cap_std::ambient_authority())?;
match opt {
Opt::Upgrade(opts) => {
- #[cfg(feature = "composefs-backend")]
- if composefs_booted()?.is_some() {
- upgrade_composefs(opts).await
- } else {
- upgrade(opts).await
- }
-
- #[cfg(not(feature = "composefs-backend"))]
- upgrade(opts).await
+ let sysroot = &get_storage().await?;
+ upgrade(opts, sysroot).await
}
Opt::Switch(opts) => {
#[cfg(feature = "composefs-backend")]
diff --git i/crates/lib/src/store/mod.rs w/crates/lib/src/store/mod.rs
index cab4167e..736aee9a 100644
--- i/crates/lib/src/store/mod.rs
+++ w/crates/lib/src/store/mod.rs
@@ -49,6 +49,11 @@ pub const COMPOSEFS_MODE: Mode = Mode::from_raw_mode(0o700);
/// system root
pub(crate) const BOOTC_ROOT: &str = "ostree/bootc";
+pub enum Primary {
+ Ostree(SysrootLock),
+ Composefs(Arc<ComposefsRepository>)
+}
+
/// A reference to a physical filesystem root, plus
/// accessors for the different types of container storage.
pub(crate) struct Storage {