bootc icon indicating copy to clipboard operation
bootc copied to clipboard

uid/gid drift possible when switching between different host builds

Open ggiguash opened this issue 1 year ago • 9 comments
trafficstars

When upgrading from an ostree to a bootc image, the SSH key file permissions may get invalidated in the /etc/ssh directory. This is causing SSH service to fail its startup, effectively preventing the user from remotely accessing the host following the upgrade.

image

(edit by @cgwalters) Originally: https://issues.redhat.com/browse/USHIFT-3749

ggiguash avatar Jul 11 '24 07:07 ggiguash

The following work around can be implemented in a bootc container file to address this issue.

# Implement workarounds necessary for the successful upgrade from an ostree to a bootc image.
# - The permissions of SSH key files at /etc/ssh directory need to be fixed
#   so that they are owned by the 'ssh_keys' group.
RUN mkdir -p /etc/systemd/system/sshd.service.d && \
    printf "\
[Service]\n\
ExecStartPre=/bin/sh -c 'chgrp ssh_keys /etc/ssh/ssh*key*'\n" > "/etc/systemd/system/sshd.service.d/ssh-key-permissions.conf"

ggiguash avatar Jul 11 '24 10:07 ggiguash

You will likely hit the following once you rebase to a CentOS Stream/RHEL version with this openssh change:

  • https://fedoraproject.org/wiki/Changes/SSHKeySignSuidBit
  • https://github.com/coreos/fedora-coreos-tracker/issues/1394
  • https://bugzilla.redhat.com/show_bug.cgi?id=2172956

travier avatar Jul 11 '24 13:07 travier

( I've added a link to https://issues.redhat.com/browse/USHIFT-3749 where this was discussed originally to avoid duplicate discussion where relevant)

But again, this issue is not about "ostree vs bootc", it's specially about "ostree commits as built by RHEL for Edge" vs "fedora/c9s/rhel derived bootc" which happen to allocate different uids/gids due to buildsystem drift.

It could also occur when switching between bootc images that have different defined ssh_key gids, and it could also occur when switching between ostree images (again if the buildsystem generating those hasn't ensured they're the same).

Because of this, I don't think this is really a bootc issue at a technical level. It's more a higher level buildsystem issue.

I would accept though that in practice, it may make sense for us to at least track the overall issue here, but...if we were to do something like that ssh key chown hack, I would probably say the code would logically be owned by e.g. https://gitlab.com/fedora/bootc/base-images

Except for the fact that the world does not need more load bearing bash scripts run as unrestricted root, and maybe we actually implement it in Rust here as an opt-in...similarly to https://github.com/coreos/rpm-ostree/pull/4913 in many ways.

cgwalters avatar Jul 11 '24 15:07 cgwalters

It could also occur when switching between bootc images that have different defined ssh_key gids, and it could also occur when switching between ostree images (again if the buildsystem generating those hasn't ensured they're the same).

The problem can certainly occur when switching between images of the same type, but only in the case when different image parents are used for the base image vs update image. To mitigate this issue, the way we usually build our image hierarchies is to have a base (parent) image and then derive all the updates from it.

The ostree-to-bootc upgrade is a special case because we're guaranteed to have different parents, so it's a 100% failure. Thus, I thought to emphasize the need to provide a solution for it.

While I prefer this to be handled by the base images we produce, the fallback would be to document all the known pitfalls, which goes back to the https://github.com/containers/bootc/issues/671 issue.

P.S. The title of this issue was changed to be more generic and we may want to go back to the specific problem of ostree-to-bootc upgrade because the generic one is a "known" issue.

ggiguash avatar Jul 11 '24 15:07 ggiguash

The ostree-to-bootc upgrade is a special case because we're guaranteed to have different parents

No - it's equally possible to build a custom base image using the same buildsystem that produces ostree commits today. In fact, there's significant tooling to streamline exactly that process, such as e.g. rpm-ostree compose container-encapsulate. It's exactly how Fedora CoreOS works today...an ostree commit is generated, and then turned into a container image.

cgwalters avatar Jul 11 '24 16:07 cgwalters

it's equally possible to build a custom base image using the same buildsystem that produces ostree commits today.

I did not know this tooling was in place. I will investigate this direction asap. Based on this discussion, I agree that we are facing a generic uid/gid mismatch issue, thank you for clarifying it.

ggiguash avatar Jul 11 '24 19:07 ggiguash

Further investigation on installing bootc images on top of an unrelated ostree commit led me to a dead end. The problem with the uid/gid is serious. In my case, I got openvswitch and hugetlbfs users/groups overwritten after deploying a bootc image. This makes quite a few services fail on boot and it practically renders the system unusable with no easy way of remediation.

Next, I followed up on using the rpm-ostree compose container-encapsulate command to create a base container image that can be used for bootc container builds and deployment. I completed a full cycle of creating ostree-base-container, derived bootc-container and installing the latter on top of the ostree system. This worked very well - no surprises.

I did encounter a few small issues that might be worth mentioning in the upgrade instructions:

  • The base ostree image should include bootc, dnf and subscription-manager RPM packages
    • bootc is required for switching the image on an ostree system using bootc toolchain
    • dnf is required because many container build procedures use it and it does not come packaged by default with ostree image
    • subscription-manager is required to query/configure RHSM during the container builds or in runtime
  • The following workaround was required in the derived bootc container image build to enable host subscriptions and create missing /opt directory
# Work around the missing ostree-container functionality:
# - Enable librhsm which enables host subscriptions to work in containers
# - Image has /opt symlink pointing to a non-existent /var/opt, so create the latter
RUN ln -sr /run/secrets/etc-pki-entitlement /etc/pki/entitlement-host && \
    ln -sr /run/secrets/rhsm /etc/rhsm-host && \
    mkdir -p /var/opt

ggiguash avatar Jul 15 '24 16:07 ggiguash

@cgwalters , based on my recent experience described in the previous comment, I wonder if we should be stricter in our instructions for ostree-to-bootc upgrade? I mean, should we state that the upgrade is likely to fail unless the bootc container image is derived from the base ostree commit?

ggiguash avatar Jul 15 '24 16:07 ggiguash

The problem with the uid/gid is serious. In my case, I got openvswitch and hugetlbfs users/groups overwritten after deploying a bootc image.

Yeah, it is a really annoying problem. See also https://github.com/openshift/os/pull/1318 where we worked around this just in RHCOS...and that code isn't inherited elsewhere right.

IMO openvswitch especially is a picture perfect use case for systemd DynamicUser=yes too...

cgwalters avatar Jul 15 '24 21:07 cgwalters

Since the ssh_key group is by far the most visible instance of this, it's worth noting that for fedora-bootc its definition is here: https://gitlab.com/fedora/bootc/base-images/-/blame/main/tier-0/group?ref_type=heads#L26 (Which honestly, is just a fork of FCOS, which is itself a fork of Atomic Host...)

But what travier said is also really important to emphasize - in current Fedora and C10S I think we can just drop the ssh_keys group from the base image.

cgwalters avatar Nov 22 '24 14:11 cgwalters

Let's again narrow in on openvswitch specifically, which has a floating user. Now, it does today have a sysusers.d fragment, but when installed in a container layer, its RPM script ends up creating the user.

However, openvswitch also has a specific problem in that /etc/openvswitch is owned by that uid+gid - so we cannot create it via sysusers because that's "static" content in the image today.

The most direct hack for the openvswitch case today is (at build time) is basically adding a systemd unit that does ExecStartPre=chown -R openvswitch:openvswitch /etc/openvswitch.

But what would be better here...is to not have the user at build time! We should have tooling to "flush" /etc/passwd and /etc/group - basically for each user/group that has a sysusers.d fragment, we undo those changes. This tooling needs to error out if there's any files in the image owned by that uid/gid.

However, even better again than all of that for openvswitch would be DynamicUser=yes - the default config file for openvswitch /etc/openvswitch/default.conf could be owned by root. It looks to me like the rest of the stuff could live in StateDirectory=.

cgwalters avatar Nov 22 '24 14:11 cgwalters

Related to this, cockpit just switched to DynamicUser=yes for similar reasons.

cgwalters avatar Nov 22 '24 14:11 cgwalters

I filed https://issues.redhat.com/browse/RHEL-68655 to engage the openvswitch maintainers.

cgwalters avatar Nov 22 '24 14:11 cgwalters

in current Fedora and C10S I think we can just drop the ssh_keys group from the base image.

@cgwalters to go ahead with this, at least from the ssh pov, should we just remove the group on the base image and rebuild it then?

UPDATE: see https://gitlab.com/fedora/bootc/base-images/-/merge_requests/66

~~For openvswitch maybe we should engage with them on the topic of using DynamicUser~~

runcom avatar Nov 25 '24 11:11 runcom

Hitting this also in the context of https://github.com/openshift/enhancements/pull/1637, where I'm converting the OCP node image to become a layered image on top of an RHCOS base (instead of being part of the initial compose).

What's interesting here is that drift doesn't just affect users/groups defined in the layered portion, but any dynamically allocated uids/gids in the base compose. Look at e.g. the diff for /usr/lib/group:

diff --git a/unlayered/usr/lib/group b/layered/usr/lib/group
index ac34a57..7db8e7b 100644
--- a/unlayered/usr/lib/group
+++ b/layered/usr/lib/group
@@ -43,14 +43,11 @@ sshd:x:74:
 chrony:x:992:
-clevis:x:793:
+clevis:x:795:
-containers:x:792:
-dnsmasq:x:791:
+dnsmasq:x:794:
-gluster:x:794:
+gluster:x:796:
 hugetlbfs:x:801:openvswitch
-input:x:798:
 kvm:x:36:
 openvswitch:x:800:
-printadmin:x:795:
+printadmin:x:797:
-render:x:797:
+render:x:799:
-systemd-coredump:x:796:
+systemd-coredump:x:798:
-systemd-journal-remote:x:790:
+systemd-journal-remote:x:793:
-unbound:x:799:

E.g. clevis, dnsmasq, etc... these are all from base packages (which yeah, /usr/lib/group is populated by rpm-ostree and would normally not be changed during layering operations). But because OCP packages are no longer part of the base compose, the way the GIDs are distributed are different even in the base image. This I think is a surprising gotcha.

We basically need a way to (1) at base compose time, freeze those GIDs to what they were in previous composes, and (2) at layering time, force the GIDs of layered packages to match those they previously had when they were part of the base compose. Similarly for UIDs.

jlebon avatar Nov 26 '24 20:11 jlebon

Can you link the code for this?

cgwalters avatar Nov 26 '24 21:11 cgwalters

Not much code. Just https://github.com/openshift/os/blob/master/Containerfile, which runs https://github.com/openshift/os/blob/master/scripts/apply-manifest, which applies https://github.com/openshift/os/blob/master/packages-openshift.yaml. Basically just a wrapper around rpm-ostree install (though locally I've already switched that over to dnf install) and running the postprocess scripts.

jlebon avatar Nov 26 '24 22:11 jlebon

Hmmm, https://github.com/openshift/os/blob/ce4da959eb3fc9477a0a96b69e6efd084a69b9ac/packages-openshift.yaml#L61 is in the mix here too.

cgwalters avatar Nov 27 '24 13:11 cgwalters

I have PR: https://gitlab.com/fedora/bootc/base-images/-/merge_requests/67
to propose the idea of tackling uid/gid shift post-upgrade. Wanted to share this quickly here to get some early feedback and if this approach makes sense.

say-paul avatar Nov 27 '24 14:11 say-paul

Re: https://gitlab.com/fedora/bootc/base-images/-/merge_requests/67 PR

This approach makes sense. Can you explain how it addresses the case when the users / groups "disappear" in the new commit? I think we might want to extend the script to check for users / groups in 2 places: current commit and new commit.

ggiguash avatar Nov 28 '24 06:11 ggiguash

Can you explain how it addresses the case when the users / groups "disappear" in the new commit?

I considered only the shift in uids/gids, but as per the disappearance use-case- I think that while scanning through all the files we can create them when we encounter files owned by user/group not present in the list.

extend the script to check for users/groups in 2 places: current commit and new commit.

Comparing the files from two deployments and setting merge logic can work too, I guess it needs a bit of thought on how to implement rollback. I will try that out, will keep posted.

say-paul avatar Nov 28 '24 14:11 say-paul

Ended up just doing openshift/os@f202927 (#1661) for the RHCOS/OCP case.

jlebon avatar Nov 29 '24 18:11 jlebon

@ggiguash I have modified the approach which can restore deleted user/groups. I have identified 3 usecases that needs to be handled, if there are other-cases in you mind please let us know.

say-paul avatar Dec 02 '24 16:12 say-paul

For me the long term solution for this is https://github.com/coreos/fedora-coreos-tracker/issues/1599 with the transition path in https://github.com/coreos/fedora-coreos-tracker/issues/155#issuecomment-1688447446.

In the meantime, I think what Jonathan did is the best approach: freeze the IDs in the image to a "known state".

travier avatar Dec 02 '24 18:12 travier

For me the long term solution for this is coreos/fedora-coreos-tracker#1599 with the transition path in coreos/fedora-coreos-tracker#155 (comment).

In the meantime, I think what Jonathan did is the best approach: freeze the IDs in the image to a "known state".

@travier , do the above proposals address the issue of upgrading existing systems?

ggiguash avatar Dec 02 '24 20:12 ggiguash

It does in https://github.com/coreos/fedora-coreos-tracker/issues/155#issuecomment-1688447446.

travier avatar Dec 03 '24 09:12 travier

Just to follow up here, https://github.com/containers/bootc/pull/1111 will warn about the cases of having groups that are entirely missing sysusers entries.

However, there's a lot of next steps, because e.g. we need to make it more of a fatal error when a floating uid/gid owns files in the image for example (though detecting "floating" is messy).

And I think we need to encourage dropping things out of /etc/passwd that have sysusers entries.

Combined, these will at least make it more obvious when things will fail.

cgwalters avatar Feb 26 '25 14:02 cgwalters

Let's again narrow in on openvswitch specifically, which has a floating user. Now, it does today have a sysusers.d fragment, but when installed in a container layer, its RPM script ends up creating the user.

However, openvswitch also has a specific problem in that /etc/openvswitch is owned by that uid+gid - so we cannot create it via sysusers because that's "static" content in the image today.

The most direct hack for the openvswitch case today is (at build time) is basically adding a systemd unit that does ExecStartPre=chown -R openvswitch:openvswitch /etc/openvswitch.

But what would be better here...is to not have the user at build time! We should have tooling to "flush" /etc/passwd and /etc/group - basically for each user/group that has a sysusers.d fragment, we undo those changes. This tooling needs to error out if there's any files in the image owned by that uid/gid.

However, even better again than all of that for openvswitch would be DynamicUser=yes - the default config file for openvswitch /etc/openvswitch/default.conf could be owned by root. It looks to me like the rest of the stuff could live in StateDirectory=.

OVS has quite a bit of reliance on a network of DAC files. For example, we need the hugetlbfs group shared with libvirt for DPDK related sockets, and those live under different directories. I don't see how doing dynamic assignments works here, but I'll admit I only have a shallow understanding of the way systemd is managing user / group details. Also, OVS is shared among lots of different OS types, and we shouldn't make changes that will break, for example freebsd or non-systemd systems to solve a bootc use case. I'm not sure how much rework ovs-ctl would require, and where that might break posix compliance.

apconole avatar Mar 24 '25 13:03 apconole

I don't see how doing dynamic assignments works here, but I'll admit I only have a shallow understanding of the way systemd is managing user / group details.

There's more in e.g. https://0pointer.net/blog/dynamic-users-with-systemd.html

However, some work would likely need to be done for /etc/openvswitch.

Also, OVS is shared among lots of different OS types, and we shouldn't make changes that will break, for example freebsd or non-systemd systems to solve a bootc use case.

Many image-based operating systems have similar issues, it's not specific to bootc. But yes, needing to support non-systemd cases increases the complexity here.

In the end for openvswitch today, it's in that case where it uses sysusers (good!) but until we move the files owned by the user out of the RPM, one will need to do the ExecStartPre=chown -R openvswitch:openvswitch /etc/openvswitch type hacks or forcibly statically assign it in the container build.

cgwalters avatar Mar 24 '25 16:03 cgwalters

Like almost all of our issues related to users/groups, this issue is wide-ranging. For now, I'm going to mark it as a duplicate of two things

  • https://gitlab.com/fedora/bootc/tracker/-/issues/50 (drives work needed from the OS side)
  • And I filed https://github.com/bootc-dev/bootc/issues/1263 with a potentially powerful targeted fix we can do bootc side

cgwalters avatar Apr 10 '25 12:04 cgwalters