bootc WIP: Add support for `--replace-mode=alongside` for ostree target

Ironically our support for --replace-mode=alongside breaks when we're targeting an already extant ostree host, because when we first blow away the /boot directory, this means the ostree stack loses its knowledge that we're in a booted deployment, and will attempt to GC it...

https://github.com/ostreedev/ostree-rs-ext/pull/550/commits/8fa019bfa821303cfb7a7f069ae2320f4c3800fa is a key part of the fix for that.

However, a notable improvement we can do here is to grow this whole thing into a real "factory reset" mode, and this will be a compelling answer to https://github.com/coreos/fedora-coreos-tracker/issues/399

To implement this though we need to support configuring the stateroot and not just hardcode default.

Oct 02 '23 17:10 cgwalters

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

Oct 02 '23 17:10 openshift-ci[bot]

Sorry @cgwalters , accidentally pushed the rebase to your fork instead of mine

~EDIT: undid it, continuing my rebase efforts on https://github.com/omertuc/bootc/tree/137clone~

EDIT2: Continuing here

Sep 18 '24 08:09 omertuc

I think you can just take over this PR too if you want, or open a new PR from your fork - either way.

Sep 18 '24 12:09 cgwalters

Rebased. Without any changes, I'm facing an issue where in an ostree system, the mounted / on the host system is an overlay (-v mounted into /:/target) and so the findmnt source for it is overlay rather than a /dev/... and so it trips up lsblk later on

I'll see how I can tweak it so that it finds the right device

Sep 19 '24 15:09 omertuc

With additional -v /sysroot:/target -v /sysroot:/target/sysroot mounts instead of -v /:/target and --stateroot foo, this seems to work

Oct 02 '24 16:10 omertuc

@cgwalters thoughts on the above mounts? Do we want to require them for install on ostree targets, or should I figure out a way to make this work without them, using just the already-documented install mounts (i.e. /:/target)?

Oct 02 '24 16:10 omertuc

and so the findmnt source for it is overlay rather than a /dev/... and so it trips up lsblk later on

We should learn how to peel that. This is really the same thing as https://bugzilla.redhat.com/show_bug.cgi?id=2308594 and https://github.com/ostreedev/ostree/issues/3198 and https://github.com/containers/composefs/issues/280

Short term the simplest is the same logic as the grub patch - detect overlayfs for / and check if /sysroot exists and is mounted, if so use that.

Oct 02 '24 18:10 cgwalters

and so the findmnt source for it is overlay rather than a /dev/... and so it trips up lsblk later on

We should learn how to peel that. This is really the same thing as bugzilla.redhat.com/show_bug.cgi?id=2308594 and ostreedev/ostree#3198 and containers/composefs#280

Short term the simplest is the same logic as the grub patch - detect overlayfs for / and check if /sysroot exists and is mounted, if so use that.

OK. Changed it so that when the target rootfs is an overlay, we'll implicitly try targetting <original_target>/sysroot instead.

It wasn't working at first and was a bit of a headache for me to debug because apparently if you mount /:/target then inside the container /target/sysroot is read-only by default, and so ensure_dir_labeled was failing, as opposed to when you mount /sysroot:/target directly, in which case it's not read-only. Took me a while to track that down chasing red herrings, and I'm still not sure who's responsible for this behavior (kernel? podman?), but after I realized it I simply moved ensure_dir_label to run only after your added let _ = crate::utils::open_dir_remount_rw... and then the rest just worked.

Current code might need a bit of touch-ups, but do you think the direction of the code in its current state is good? Should I clean it up and undraft?

Oct 08 '24 20:10 omertuc

It wasn't working at first and was a bit of a headache for me to debug because apparently if you mount /:/target then inside the container /target/sysroot is read-only by default, and so ensure_dir_labeled was failing, as opposed to when you mount /sysroot:/target directly, in which case it's not read-only.

I think that's possibly because it's bootc that's special casing mounting /sysroot read-write - that's how we do it outside of a container at least.

Oct 11 '24 18:10 cgwalters

The current experience is:

echo "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOTVytyhnSfX20smAsNKYG5Zpz6vSzDZu22S8PCDJ2Iw omer" > authkeys

sudo podman run --rm --privileged -v $PWD/authkeys:/authkeys -v /dev:/dev -v /var/lib/containers:/var/lib/containers -v /:/target --pid=host --security-opt label=type:unconfined_t -e RUST_LOG=trace quay.io/otuchfel/bootc:latest bootc install to-existing-root --replace alongside --acknowledge-destructive --stateroot foobar --root-ssh-authorized-keys /authkeys

Following that, when I reset, it immediately boots into the new foobar stateroot, there are no grub boot entries for the original one:

I assume this is not our desired experience, and I should look into having the original boot entry preserved?

Oct 22 '24 11:10 omertuc

It's a great question. I think for install --replace=alongside, that is indeed the expectation by default.

Let's then target merging this as is?

I think then what we should look at is weaving in this functionality into https://github.com/containers/bootc/issues/404 right?

That said we may end up wanting to expose this as something like bootc install --retain or something? At least in the ostree case it's trivial to do, we just don't blow away /boot :smile:

The super messy thing is interoperating with non-ostree-ready bootloader setups so --retain would probably have to just error out in that case for now.

IOW:

Let's merge as is
Look at --retain as a followup ?

Oct 22 '24 16:10 cgwalters

sgtm

Oct 22 '24 16:10 omertuc

OK so we should have CI coverage here...I think it's actually as easy as doing another install after an initial one without doing the manual "wipe ostree" stuff we have in one of the install tests right?

Oct 22 '24 17:10 cgwalters

OK so we should have CI coverage here...I think it's actually as easy as doing another install after an initial one without doing the manual "wipe ostree" stuff we have in one of the install tests right?

Not sure I agree, ideally we should actually boot into the installed ostree system first? But getting reboots to work with the current tests-integration/src/install.rs harness will be tricky

But anyway can't hurt having a test like you suggested, even without a reboot

Oct 23 '24 11:10 omertuc

Yes, testing with reboots is going to require some more infrastructure here.

Oct 23 '24 12:10 cgwalters

Yes, testing with reboots is going to require some more infrastructure here.

~I've added a test but as suspected, the test passes even on the main branch, so it's not very helpful for verifying this PR without a proper reboot. Should we keep it anyway?~

~scratch that I'm silly and forgot to remove the call to reset_root~

Unscratch that even after removing the call to reset_root, the test still passes on the main branch :face_with_diagonal_mouth:

Oct 23 '24 13:10 omertuc

We talked about this and realized that while the new test passes, it would pass already today because it's actually the "install --stateroot" that makes it work.

The main fix we need here is preserving existing deployments when we detect we're booted via ostree.

So actually a way we could test this is via our tmt tests instead.

But in the end again I'm good to merge as is. Since I wrote this PR I can't approve it, you (or someone else) needs to do so.

Oct 24 '24 21:10 cgwalters

So actually a way we could test this is via our tmt tests instead.

But in the end again I'm good to merge as is.

Branching this to a separate issue #847

Oct 25 '24 11:10 omertuc

Up until now I've tested the PR on a random RHEL AI (bootc) system I had lying around, and it worked, but now I'm trying it on Silverblue and it fails. This seems to be due to the fact that / (/target inside the bootc container) on silverblue, despite being a btrfs rw mount, cannot be written to, so our rw remount is futile

Particularly the action that fails is fchmod but it's only because it's the first write operation we perform on the target

Oct 25 '24 11:10 omertuc

silverblue, despite being a btrfs rw mount, cannot be written to

What error do we get? -EROFS? What does findmnt look like when our container is executed? I wonder if we're messing up a remount?

Oct 25 '24 12:10 cgwalters

silverblue, despite being a btrfs rw mount, cannot be written to

What error do we get? -EROFS? What does findmnt look like when our container is executed? I wonder if we're messing up a remount?

Reusing extant ostree layout
DEBUG Loaded SELinux policy: b06a119416deb696a80dbb67248b66d3829780c9e72d376a5089097ace6de5f8
DEBUG Target . is a mountpoint, remounting rw
DEBUG Labeling .
TRACE Label for . is Some("system_u:object_r:root_t:s0")
ERROR Installing to filesystem: Creating ostree deployment: fchmod: Operation not permitted (os error 1)

findmnt inside the container:

bash-5.1# findmnt -J -v --output=SOURCE,TARGET,MAJ:MIN,FSTYPE,OPTIONS,UUID --mountpoint /target
{
   "filesystems": [
      {
         "source": "/dev/vda3",
         "target": "/target",
         "maj:min": "0:32",
         "fstype": "btrfs",
         "options": "rw,relatime,seclabel,compress=zstd:1,discard=async,space_cache=v2,subvolid=258,subvol=/root",
         "uuid": "41f3f4ae-1242-4ef1-bcd8-fd8706f51738"
      }
   ]
}

Oct 25 '24 12:10 omertuc

ERROR Installing to filesystem: Creating ostree deployment: fchmod: Operation not permitted (os error 1)

Is there maybe a selinux denial here? Failing on fchmod is really odd.

Also hmm we must be missing some .context() here...are we failing on ensure_dir_labeled()? I bet that's it...but it'd be good to know for sure

Oct 31 '24 15:10 cgwalters

@omertuc I think we need to get more of your PRs merged, you're doing good work! This one specifically I guess I'm uncertain if we should block on the silverblue-reinstall case; I suspect it's btrfs specific.

But I may look at it today...

Nov 05 '24 16:11 cgwalters

@omertuc I think we need to get more of your PRs merged, you're doing good work! This one specifically I guess I'm uncertain if we should block on the silverblue-reinstall case; I suspect it's btrfs specific.

But I may look at it today...

Yeah I was downloading Fedora CoreOS (as opposed to Silverblue) today to see if we hit similar issues, I'll report with what I find

Nov 05 '24 17:11 omertuc

OK yeah there's been a ton of changes in the main branch since this PR was last rebased, it's a conflict-fest. I took an attempt at it and pushed my update to https://github.com/cgwalters/bootc/tree/install-existing-ostree2

Nov 05 '24 18:11 cgwalters

One big conflict was around how the install phase got split up, and threading through has_ostree between those parts would have been annoying.

I did https://github.com/containers/bootc/pull/872 instead which is just dead code to start, but moves the "we detected an extant repo" into the "big bag of state" we already had in RootSetup.

Nov 05 '24 18:11 cgwalters

stop stop stop :laughing: you're working on your old stale branch, I've already solved all these conflicts a while back

Nov 06 '24 00:11 omertuc

Forced pushed now to solve some new tiny conflict on import lines, other than that there's no conflicts vs main

Nov 06 '24 00:11 omertuc

Forced push again because of duplicate import

Nov 06 '24 00:11 omertuc

I suspect it's btrfs specific.

Yeah I was downloading Fedora CoreOS (as opposed to Silverblue) today to see if we hit similar issues, I'll report with what I find

I just tried Fedora CoreOS and I can confirm the same issue happens even with xfs, so it's not a btrfs issue

Nov 06 '24 01:11 omertuc