install: Run outside of a container
We're working on having Anaconda indirectly run bootc install to-filesystem when ostreecontainer is in use. The current plan is to require bootc be present on the install ISO, which will happen automatically as part of https://src.fedoraproject.org/rpms/rpm-ostree/c/e31f656abc0d451f0ffdd2a3afd60944796c2246?branch=rawhide propagating.
Now...here's the thing, there's a grab bag of stuff we've accumulated in bootc install that assumes its source root == target root. For example:
- Detecting whether SELinux is enabled
- Kernel arguments
The core issue today is that there's some circular dependencies forced by ostree because ostree container image deploy is doing two things:
- Fetching the target container image
- Setting up a bootloader entry for it (including kernel arguments that are input to that)
We will need to more carefully split up "fetch" from "deploy". Also today, ostree does not directly offer an opinionated way to temporarily "mount" the merged root...we don't make a composefs for example until the deploy phase. But we can bypass that by invoking e.g. ostree checkout --composefs and mounting that say.
In general this is going to take some careful auditing of the code, but the good part is that if we do this right, we can just make "invoked from source container" as a special case of "fetch source rootfs".
Background:
anaconda and bootc
REF docs https://docs.fedoraproject.org/en-US/bootc/bare-metal/
Original suggestion:
- Build a base image with LBIs for testing use (could be anything)
- Hack on a patch to https://github.com/ostreedev/ostree-rs-ext/ that checks if the input container image has the
containers.bootclabel (see also https://github.com/ostreedev/ostree-rs-ext/pull/673 which extended our usage of that), and if present and /usr/bin/bootc is present, then we switch to doingbootc install to-filesysteminstead.
It's actually quite possible here though that we need to make bootc install to-filesystem here actually not care that it's not being run in a container image, but it's just picking up ambient host tooling.
Restate problem set A: PXE boot scenario
Stock ISO does not have customer container content; today the ostreecontainer verb is pointed at the generic filesystem (and block storage) that Anaconda created
Prerequisite to distinguish
Detect containers.bootc label in target image OR do this unconditionally and backtrack if we actually break something.
Prerequisite: https://github.com/containers/bootc/pull/860
Need to teach bootc to fetch LBIs.
Path A:
(Instruct customer to...)?
%pre
podman pull quay.io/example/someos:latest
%end
Then ostree container image deploy could redirect to podman run <image> bootc install to-filesystem.
Path B:
Require bootc be part of the ISO (through rpm-ostree dependency) (or worst case, curl/rpm -Uvh in %pre).
In this flow, we need to bootc install to-filesystem to have something like --pull which it doesn't do by default now.
Problem: bootc install to-filesystem expects to be run from container
Unlike ostree container image deploy. Now we do have --source-imageref which I think turns off some of the container checks.
Solution: Change bootc to detect when it's not in a container and unconditionally pull the target image. We will need to document how authfiles work in this path etc.
I was thinking about this and looking at the code a little bit more. One issue we have is the install config; we need to change things so that when we're run outside of a container we don't look in the host root (since that generally doesn't make sense).
If we go down this path one thing I'd like to investigate is ensuring that we always execute the bootc binary from the container image for at least the "second half" of the install - for example, it should be the skopeo/podman binary in the target container, not the one that happens to be the host that pulls LBIs.
And in general it will help us retain "agility" if we try hard to minimize what happens in "host bootc". To give just one example I was also looking at fsverity stuff; and we couldn't rely on that being in the "host bootc". But if we exec the target bootc as a container after download, then we can where necessary do fixups on any initial state.
@travier and I were talking about a related topic last week (although in a slightly different context). We might imagine that, in kind of a "install new image" scenario, happening during install, or on a running system, or a filesystem image build on some build server somewhere, or many other possible situations, the "host" system could look very different to the system inside of the container.
I personally feel like it should not be required to have a container runtime in order to install an update to a running bootc system. That means that host bootc and container bootc need to be equivalent on all points that are relevant to installing the image.
One solution to that problem could be to try to be strict about what is tolerated as a difference between a particular image and an image that's installed as an "update" to that image. I feel like this is a losing path to go down, though, since there's going to be long-lived deployment scenarios where transitions need to eventually occur at some point, and getting into this situation of "you need to update your system in order to be able to install this next update" kinda sucks.
So that sort of drives us into this situation where we either need to:
- run the container when we install it; or
- don't.
It might make sense to come up with different answers here for bootc vs normal container installs. Our idea about the security model for composefs where we allow untrusted users to request them (ie: don't trust an erofs image we download from the wire) sort of implies that we don't want to run the container image as part of that process. Performing an operation that adds a kernel image to /boot and modifies the bootloader config is sort of another world, by comparison. An entire OS install is even further to that extreme end.
All the same, I feel like we might like to draw a line in the sand and say that we want to support embedded systems with bootc where we do not require a container runtime in order to install updates.
I personally feel like it should not be required to have a container runtime in order to install an update to a running bootc system.
Yes...that's already the case in theory since the start in the way we use skopeo + ostree, neither of which are "container runtimes". Although the reality is pretty much everyone shipping a bootc system is going to be including a container runtime I think...it's kind of hard not to. Also, this relates to https://github.com/containers/bootc/issues/640 that I am pretty sure we'll want to do at some point.
But as of recently we started depending on podman for LBIs. Maybe for consistency it'd make sense to try switching that to skopeo too, and it would help make things more pluggable in theory.
During the call today we discussed the idea of something like "container lite" for running the code that needs to run on updates. Colin doesn't like chroot (because of the need to manually handle API-like filesystems like /dev, /proc, /sys, etc.) bwrap was mentioned as an option...
a little update on my progress from today. I created this script to speed up testing (run as root):
#!/bin/bash
#deps
#dnf install -y bootupd
mkdir -p /var/mnt
MOUNT=/var/mnt
set -x
umount "$MOUNT"
set -e
rm -rf disk.img
truncate -s 10G disk.img
DEVICE=$(losetup -f --show disk.img)
parted "$DEVICE" --script mklabel gpt
parted "$DEVICE" --script mkpart primary ext4 0% 100%
partprobe "${DEVICE}"
mkfs.ext4 "${DEVICE}p1"
mount "${DEVICE}p1" "$MOUNT"
mkdir "${MOUNT}/boot"
./target/release/bootc install to-filesystem --source-imgref docker://quay.io/centos-bootc/centos-bootc:stream9 "$MOUNT"
# podman run --pid=host --network=host --privileged --security-opt label=type:unconfined_t -v /dev:/dev -v /var/mnt:/foo localhost/bootc bootc install to-filesystem /foo
this fails with:
Running bootupctl to install bootloader
> bootupctl -vvvvv backend install --write-uuid --update-firmware --auto --device /dev/loop35 /var/mnt
[TRACE bootupd] executing cli
[INFO bootupd::bootupd] System boot method: EFI
[DEBUG bootupd::efi] Unmounting
error: boot data installation failed: installing component EFI: No update metadata for component EFI found
ERROR Installing to filesystem: Installing bootloader: Task Running bootupctl to install bootloader failed: ExitStatus(unix_wait_status(256))
I'm guessing bootupctl is looking at the host to determine how to install the bootloader. Haven't dug too far into bootupd yet to figure out how to fix this. Or, maybe bootupctl just needs to be run from within the target image since at this point we've already pulled the image and setup the ostree root?
I'm guessing bootupctl is looking at the host
From what I've seen it doesn't seem to be the case, bootupctl knows how to look in a system other than the host (bootc points bootupctl at /dev/loop5 and /var/mnt: "bootupctl" "backend" "install" "--write-uuid" "--update-firmware" "--auto" "--device" "/dev/loop5" "/var/mnt").
The issue seems to be that it's looking in /var/mnt/usr/lib/bootupd/updates/EFI.json but the file can only be found in /var/mnt/ostree/deploy/default/deploy/8df160d8fb0a54d3968d36822782c9a2b249339e3262810269b66eca46998d1d.0/usr/lib/bootupd/updates/EFI.json
It's as if it's expecting the ostree deployment to actually be deployed / mounted, but it's not for some reason
ah thanks. I think the "/dev/loop5" and "/var/mnt" options are where the bootloader is installed. From the help page of bootupctl backend install --help:
--auto
Automatically choose components based on booted host state.
For example on x86_64, if the host system is booted via EFI, then only enable
installation to the ESP.
so, when running via podman the host is the bootc container and bootupctl can load the correct components. My machine doesn't have any bootupd config so it fails. I'll try running from a chroot in the ostree dir.
a little update on my progress from today. I created this script to speed up testing (run as root):
Take a look at systemd-parted. Here's an example:
- https://github.com/containers/composefs-rs/blob/main/examples/uki/make-image
- https://github.com/containers/composefs-rs/tree/main/examples/uki/repart.d
Random thought, and I know this is basically the status quo today, but since we're talking about this more explicitly now:
Having the container image involved in installing itself effectively gives a de-facto "install hook" mechanism to any image that wants one. People are going to start using this for things we don't agree with. We need to decide if this is something we really want to do.
People are going to start using this for things we don't agree with
Can you elaborate on what you're thinking of as an example there? Ultimately we're giving people the ability to execute arbitrary code as their OS in arbitrary ways at a pretty low level, but really that's been true forever before containers even, install Debian/Fedora and you can run arbitrary code as root...
The fix (and challenge) I think is to drive people towards well-tested patterns and have good documentation etc.
I think a lot more about the converse problem, there are things people want to do that we don't support well yet but should (most strongly related to the install path is probably something like https://github.com/containers/bootc/pull/267 or better https://github.com/containers/bootc/pull/100 for unified day 2).
I am sorry for going back and forth on this. I think this one overall may still make sense, but https://github.com/containers/bootc/pull/915 will mostly solve the Anaconda case without deeper changes.
I am going to close this one for now. We already have so many ways to instantiate systems and this would add a whole new one. Now that #915 landed the immediate pressure is off.
Sorry again for all the back and forth on this. The direction to take I think for Anaconda again (and related installers) is tracked in https://github.com/rhinstaller/anaconda/discussions/5197 and I think that's what we should still pursue.
We'll need to pick this one back up as Anaconda is going to continue to use the bootc binary from the install environment.
We have a lot of stuff like https://github.com/bootc-dev/bootc/blob/df2da1adaf17ad0ec5309ca5dc22118b414676e8/crates/lib/src/install.rs#L1222 that we should NOT be doing if we detect we're not in a container already.