This relates to https://github.com/osbuild/osbuild-deploy-container/issues/4

We had some general agreement to support bootc install-to-filesystem; this will help long term with things like https://github.com/containers/bootc/issues/128
bootc install-to-filesystem should also grow support for being provided the base container image externally (e.g. cached in osbuild); we know this is needed for offline ISO installs too. This ties with the above for the lifecycle bound app/infra containers
We can't drop the osbuild/ostree stages because not every case will use bootc in the near future
Agreement that for the ISO/installer case any customization (embedded kickstarts, but also which installer) would likely live external to the container (blueprint or equivalent)

Nov 28 '23 18:11 cgwalters

bootc install-to-filesystem should also grow support for being provided the base container image externally

Digging in, this is messier than I thought. Still possible, but @ondrejbudai can you state more precisely the concern you had with having bootc install from the running container?

ISTM that in general going forward we'll want to support running images cached in the infrastructure, which will drive us towards using containers-storage most likely, as opposed to e.g. the dir transport. And if we do that, ISTM it's just simpler to keep bootc doing exactly what it's doing today in fetching from the underlying store as opposed to having something else push content in, right?

Nov 28 '23 21:11 cgwalters

Just to clarify, because there are two ideas here that sound very similar but are probably unrelated:

The issue you're talking about is with the idea of having bootc run from outside the base container image when running install-to-filesystem. So the idea of having it do bootc install-to-filesystem <container ref> <filesystem path> (or bootc install-to-filesystem oci-archive:/path/to/container.tar /path/to/tree for example), from a host machine would require too much work.
- This is as opposed to running podman run -v/path/to/tree:/target <container ref> bootc install-to-filesystem /target, from the host, which is how it currently works.
This issue is not related to running bootc from a container that is in an "offline" storage format like an archive, right? So we can still do podman run -v/path/to/tree:/target oci-archive:/path/to/container.tar bootc install-to-filesystem /target? Which will probably work fine in osbuild. My concern, as we discussed yesterday, is that we're putting a few too many layers of containers/namespaces here which make it hard to predict some details, but might be okay. I think it's time we actually tried this and see what we get.

If I'm understanding everything correctly (and if I'm remembering everything from yesterday's conversation), @ondrejbudai's idea to mount the container and run it in bwrap is the alternative to this, but like you said, bootc wont like that as it makes some container-specific assumptions.

Nov 29 '23 12:11 achilleas-k

I would actually combine #1 with mounting the container.

Mount the container and chroot into it (in osbuild terms, construct a buildroot by "exploding" the container, and use this a build pipeline for the following steps)
Partition a disk file using tools from inside the container
Mount the disk file to /target
Somehow get the container image in the oci format to e.g. /source/container.tar
Run bootc install-to-filesystem --source oci-archive:/source/container.tar --target /target

Note that I do have a slight preference for passing a whole container storage instead of an oci archive.

Nov 29 '23 12:11 ondrejbudai

Just to level set, this today is sufficient to generate a disk image:

$ truncate -s 20G /var/tmp/foo.disk
$ losetup -P -f /var/tmp/foo.disk
$ podman run --rm --privileged --pid=host --security-opt label=type:unconfined_t quay.io/centos-bootc/fedora-bootc:eln bootc install --target-no-signature-verification /dev/loop0
$ losetup -d /dev/loop0

Nov 29 '23 13:11 cgwalters

Backing up to a higher level, I think there are basically two important cases:

Generating a disk image from a container image stored in containers-storage: notably this is the most obvious flow in podman-desktop on Mac/Windows. Copying that into a dir or oci-archive is just an unnecessary performance hit.
Generating a disk image from a container in a remote registry: this will happen in many production build flows. It seems simplest then if we try to unify this with the first case by always pulling into containers-storage, right?

Nov 29 '23 15:11 cgwalters

Also https://github.com/containers/bootc/pull/215 can't work until bootc-image-builder starts using bootc.

Dec 05 '23 13:12 cgwalters

Backing up to a higher level, I think there are basically two important cases:

* Generating a disk image from a container image stored in `containers-storage`: notably this is the most obvious flow in podman-desktop on Mac/Windows.  Copying that into a `dir` or `oci-archive` is just an unnecessary performance hit.

Which phase of the build is this referring to? If it's about having the stage in osbuild use the host containers-storage directly, I think the performance hit isn't entirely unnecessary but gives us the caching and reproducibility guarantees that we get with osbuild. These aren't directly relevant to the current use case (running it all in an ephemeral container), but I'm also thinking about the whole disk image built use a case more generally (using the same code and flow in the service). Or is this just about having the osbuild containers cache be itself a containers-storage? That's definitely an idea I'd like to explore. If we're talking about having a convenient way of using the host's containers-storage in the bootc-image-builder container, I think that's a lot simpler.

* Generating a disk image from a container in a remote registry: this will happen in many production build flows.  It seems simplest then if we try to unify this with the first case by always pulling into `containers-storage`, right?

Generalising any solution to both cases would be preferable, I agree.

Dec 06 '23 12:12 achilleas-k

the caching and reproducibility guarantees that we get with osbuild

Thinking about this a bit more, I realise my hesitation is mostly around modifying the caching model substantially but now I'm thinking there's a good way to do this with a new, different kind of source. A containers-storage source could use the host container storage as its backend directly and pass it through to the stage.

The one "unusual" side effect would be that osbuild would then have to pull a container into the host machine's containers-storage, which I guess is fine (?). But what happens if osbuild, running as root, needs to access the user's storage? What if it writes to it?

Dec 06 '23 12:12 achilleas-k

But what happens if osbuild, running as root, needs to access the user's storage? What if it writes to it?

One thing that can occur here is that a user might be doing their container builds with rootless podman; so when they want to go make a disk image from it we'd need to copy it to the root storage. Things would seem to get messy to have a root process with even read access to a user storage because there's locking involved at least.

Dec 06 '23 13:12 cgwalters

so when they want to go make a disk image from it we'd need to copy it to the root storage

I think this makes sense. I'd want to make it explicit somehow that osbuild is doing this. It's one thing to write stuff to a system's cache when building images with osbuild (or any of IB-related projects), it's another thing to discover that your root container store now has a dozen images in it from a tool that some might think of as unrelated to "container stuff".

Dec 06 '23 13:12 achilleas-k

Pinging @kingsleyzissou here since he's working on this.

Dec 06 '23 14:12 achilleas-k

Which phase of the build is this referring to? If it's about having the stage in osbuild use the host containers-storage directly, I think the performance hit isn't entirely unnecessary but gives us the caching and reproducibility guarantees that we get with osbuild.

I'm not quite parsing this (maybe we should do another realtime sync?) - are you saying using containers-storage is OK or not?

Backing up to a higher level, I think everyone understands this but I do want to state clearly the high level tension here because we're coming from a place where osbuild/IB was "The Build System" to one where it's a component of a larger system and where containers are a major source of input.

I understand the reasons why osbuild does the things it does, but at the same time if those things are a serious impediment to us operating on and executing containers (as intended via podman) then I think it's worth reconsidering the architecture.

These aren't directly relevant to the current use case (running it all in an ephemeral container), but I'm also thinking about the whole disk image built use a case more generally (using the same code and flow in the service).

It's not totally clear to me that in a service flow there'd be significant advantage to doing something different here; I'd expect as far as "cache" fetching images from the remote registry each time wouldn't be seriously problematic. For any cases where it matters one can use a "pull-through registry cache" model.

Or is this just about having the osbuild containers cache be itself a containers-storage? That's definitely an idea I'd like to explore.

That seems related but I wouldn't try to scope that in as a requirement here. Tangentially related I happened to come across https://earthly.dev/ recently which deeply leans into that idea. At first I was like the "Makefile and Dockerfile had a baby" was kind of "eek" but OTOH digging in more I get it.

Dec 06 '23 16:12 cgwalters

Backing up to a higher level, I think there are basically two important cases:

* Generating a disk image from a container image stored in `containers-storage`: notably this is the most obvious flow in podman-desktop on Mac/Windows.  Copying that into a `dir` or `oci-archive` is just an unnecessary performance hit.

* Generating a disk image from a container in a remote registry: this will happen in many production build flows.  It seems simplest then if we try to unify this with the first case by always pulling into `containers-storage`, right?

Coming from the OpenShift/OKD side, I think ideally the tool for ostree container to disk image conversion can be run independently of osbuild, i.e. it can also be wrapped by other pipeline frameworks such as prow, tekton, argo workflows, and even jenkins for any kind of CI/CD or production build.

Agreeing on keeping the container images in containers-storage everywhere seems fine to me.

Dec 06 '23 18:12 LorbusChris

@achilleas-k it sounds with using an alternative root for the ostree container storage (with https://github.com/containers/bootc/pull/215) your concerns regarding all the images getting pulled into the machine's main container-storage might be addressed? IIUC, the ostree container-storage could be kept completely separate and e.g. live on a volume that gets mounted during the pipelinerun.

Dec 06 '23 18:12 LorbusChris

Sounds like a good solution yes.

Dec 06 '23 22:12 achilleas-k

Which phase of the build is this referring to? If it's about having the stage in osbuild use the host containers-storage directly, I think the performance hit isn't entirely unnecessary but gives us the caching and reproducibility guarantees that we get with osbuild.

I'm not quite parsing this (maybe we should do another realtime sync?) - are you saying using containers-storage is OK or not?

Well, at the time when I wrote this I was thinking it might be a problem but in my follow-up message (admittedly, just 5 minutes later) I thought about it a bit more and changed my mind.

Backing up to a higher level, I think everyone understands this but I do want to state clearly the high level tension here because we're coming from a place where osbuild/IB was "The Build System" to one where it's a component of a larger system and where containers are a major source of input.

I agree that this tension exists and it's definitely good to be explicit about it. I don't think the containers being a source of input is that big of an issue though. The containers-store conversation aside (which I now think is probably a non-issue), I think a lot of the tension comes from osbuild making certain decisions and assumptions about its runtime environment that are now changing. There was an explicit choice to isolate/containerise stages that are (mostly) wrappers around system utilities. Now we need to use utilities (podman, bootc) that need to do the same and it's not straightforward to just wrap one in the other. For example, right now, our tool is started from (1) podman, to call osbuild which runs (2) bwrap to run rpm-ostree container image deploy .... Replacing that with bootc requires starting from (1) podman to call osbuild which will run (2) bwrap to call (3) podman to run (4) bootc, and bootc will need to "take over" a filesystem and environment that is running outside of (3) podman.

I understand the reasons why osbuild does the things it does, but at the same time if those things are a serious impediment to us operating on and executing containers (as intended via podman) then I think it's worth reconsidering the architecture.

At the end of the day we can do whatever's necessary. The architecture is the way it is for reasons but those reasons change or get superseded. I think a big part of the tension is coming from me (personally) trying to find the balance between "change everything in osbuild" and "change everything else to fit into osbuild" (and usually leaning towards the latter because of personal experience and biases). Practically, though, the calculation I'm trying to make is which point between those two gets us to a good solution faster.

This is all to say, the source of the containers in my mind is a smaller issue to the (potentially necessary) rearchitecting of some of the layers I described above. We already discussed (and prototyped) part of this layer-shaving for another issue, and I think this is where we might end up going now (essentially dropping the (2) bwrap boundary).

These aren't directly relevant to the current use case (running it all in an ephemeral container), but I'm also thinking about the whole disk image built use a case more generally (using the same code and flow in the service).

It's not totally clear to me that in a service flow there'd be significant advantage to doing something different here; I'd expect as far as "cache" fetching images from the remote registry each time wouldn't be seriously problematic. For any cases where it matters one can use a "pull-through registry cache" model.

I wasn't trying to suggest we wouldn't cache in the service. I just meant to say that, if we tightly couple this particular build scenario to having a container store, we'd also have to think about how that works with our current service setup. But I might be overthinking it.

Or is this just about having the osbuild containers cache be itself a containers-storage? That's definitely an idea I'd like to explore.

That seems related but I wouldn't try to scope that in as a requirement here.

Given the comments that came later in this thread, I think I have a much clearer picture of what a good solution looks like here.

Dec 06 '23 22:12 achilleas-k

I'm working on https://github.com/ostreedev/ostree/pull/3114 and technically for the feature to work it requires the ostree binary performing an installation to be updated. With the current osbuild model, that requires updating the ostree inside this container image in addition to being in the target image. With bootc install-to-filesystem, it only requires updating the target container.

Dec 08 '23 18:12 cgwalters

@ondrejbudai and I (mostly Ondrej) made a lot of progress on this today. There's a lot of cleaning up needed and we need to look into some edge cases, but we should have something to show (and talk about) on Monday.

Dec 08 '23 19:12 achilleas-k

podman run --rm --privileged --pid=host --security-opt label=type:unconfined_t quay.io/centos-bootc/fedora-bootc:eln bootc install --target-no-signature-verification /dev/loop0

While running this command in osbuild should be possible, it means that we have a container inside a container, which seems needlessly complex. Thus, we tried to decouple bootc from podman. The result is in this branch: https://github.com/containers/bootc/compare/main...ondrejbudai:bootc:source

I was afraid that it would be hard, but it actually ended up being quite simple and straightforward. We also have a PoC with required changes to osbuild, new stages and a manifest. Note that this also needs https://github.com/osbuild/osbuild/pull/1501, otherwise bootupd fails on grub2-install.

The most important thing that this branch does is that it adds a --source CONTAINER_IMAGE_REF argument. When this argument is used, bootc no longer assumes that it runs inside a podman container. Instead, it uses the given reference to fetch the container image. It's important to note that bootc still needs to run inside a container created from the given image, however that's super-simple to achieve in osbuild.

If we decide to go this way, using bootc install-to-filesystem in bootc-image-builder seems quite straightforward. We are happy to work on cleaning-up the changes required in bootc and adding some tests to the bootc's CI in order to ensure that --source doesn't break in the future.

We think the the method above is acceptable for osbuild. However, it's a bit weird, because all the existing osbuild manifests build images in these steps:

Prepare the file tree
Create a partitioned disk
Mount it
Copy the file tree into the disk
Install the bootloader

Whereas with bootc install-to-filesystem --source, it becomes:

Create a partitioned disk
Mount it
Install everything

This has pros and cons: There's less I/O involved (you don't need to do the copy step), but the copy stage isn't actually something that's taking too much time in comparison with other steps. The disadvantage is that you cannot easily inspect the file tree, because osbuild outputs just the finished image. This hurts our developer experience, because when debugging an image, you usually want to see the file tree, which osbuild can easily output if use the former flow.

Upon inspecting bootc, it might not be that hard to split bootc install-to-filesystem into two commands:

Then the osbuild flow might just become:

Call bootc prepare-tree
Create a partitioned disk
Mount it
Copy the file tree into the disk
Call bootc finish-disk

This would probably mean some extra code in bootc, but it might be worth just doing that instead of paying the price in osbuild and harming its useability. Note that nothing changes with the way how currently bootc is used in the wild.

@cgwalters wdyt?

Dec 14 '23 13:12 ondrejbudai

Note that this also needs osbuild/osbuild#1501, otherwise bootupd fails on grub2-install.

glad I could help, and at the right time too :)

Dec 14 '23 14:12 dustymabe

Fwiw, I am working on extracting the "container as buildroot" parts of https://github.com/osbuild/osbuild/compare/main...ondrejbudai:osbuild:bootc in https://github.com/osbuild/images/compare/main...mvo5:add-container-buildroot-support?expand=1 so that it can be used in boot-image-builder (still a bit rought in there ). It would also fix the issue that we cannot build stream9 images right now (which is the main intention of this work but it's nice to see that it seems like it's generally useful).

Dec 14 '23 17:12 mvo5

The result is in this branch: https://github.com/containers/bootc/compare/main...ondrejbudai:bootc:source

First patch is an orthogonal cleanup, mind doing a PR with just that to start?

Then another PR with the rest?

This hurts our developer experience, because when debugging an image, you usually want to see the file tree, which osbuild can easily output if use the former flow.

But...the file tree is already a container which you can inspect with podman run etc. right?

Dec 14 '23 19:12 cgwalters

bootc install-to-filesystem --source

BTW just a note, this approach will require https://github.com/ostreedev/ostree/pull/3094 in the future because we already have problems with the fact that ostree (and in the future, bootc) really want to own the real filesystem writes and osbuild is today not propagating fsverity.

Dec 14 '23 19:12 cgwalters

re https://github.com/containers/bootc/commit/a3c559300a2b7e30681fc05e4edfe2b064c6947b I wrote https://github.com/containers/bootc/pull/225 (totally not tested though) that I think will be a cleaner fix here.

Dec 14 '23 19:12 cgwalters

I've been thinking about this more and in the end, I am definitely not opposed to the approach proposed - the changes would probably indeed be maintainable.

And I agree that it's very important to make the systems we design "introspectable/debuggable/visualizable/cacheable" etc. - and ultimately "filesystem trees" and their properties make up a lot of that.

However...I hope everyone would agree that for what we're doing here, 95% of the content comes from the container image, which we already have tooling to do all those things with. But yes, for injecting other filesystem-level state (whether that's users, secrets, etc.) it is important to be able to introspect/etc. it

Here's a counter proposal which basically builds on top of https://github.com/containers/bootc/issues/190 - bootc-image-builder accepts things like blueprints as input etc. (maybe in the future kickstarts, whatever) and ultimately the result of that operation is always a "layer".

(Hmm incidentally it'd be a really good idea to be sure we treat the semantics of blueprint execution in the same manner as we do for the host system, i.e. disallow writes to /usr for example; I'm not sure we do that today?)

So It would actually make sense I think to implement things the same way container stacks do, using overlayfs and serialize the result of that (in a clear distinct fashion from the base image) - then the connection with the above bootc proposal is I can choose to push that filesystem tree (layer) to a registry too - versioning, mirroring, managing, signing it the same way I do other container content - and moving the "blueprint" -> "filesystem layer" to more of a build step.

Dec 14 '23 23:12 cgwalters

Also @ondrejbudai based on that code I've invited you to be a bootc committer fwiw :smile:

Dec 15 '23 00:12 cgwalters

But...the file tree is already a container which you can inspect with podman run etc. right?

Well, if the tree inside the bootable image was the same as in the container image, we would just need to run cp -a instead of bootc. :upside_down_face:

bootc install-to-filesystem --source

BTW just a note, this approach will require ostreedev/ostree#3094 in the future because we already have problems with the fact that ostree (and in the future, bootc) really want to own the real filesystem writes and osbuild is today not propagating fsverity.

Haven't seen this one before. I agree that this is slightly annoying in osbuild, but it can be solved by the postprocess step that Alex implemented.

Here's a counter proposal which basically builds on top of containers/bootc#190 - bootc-image-builder accepts things like blueprints as input etc. (maybe in the future kickstarts, whatever) and ultimately the result of that operation is always a "layer".

Is it a counter proposal? I have a feeling that these proposals support each other, but I might be misinterpreting your proposal.

Btw, I'm not fully opposed to just dropping tree-level customizations (=adding users, files, enabling services, ...) from bootc-image-builder. However, I definitely see a great value in them. The ability to take a random bootable container image, inject a user using bootc-image-builder, boot the image and be immediately able to log in and tinker is very nice. All other methods (ignition/overlays/extra layer) AFAIK require an additional step.

(Hmm incidentally it'd be a really good idea to be sure we treat the semantics of blueprint execution in the same manner as we do for the host system, i.e. disallow writes to /usr for example; I'm not sure we do that today?)

Yup! :)

So It would actually make sense I think to implement things the same way container stacks do, using overlayfs and serialize the result of that (in a clear distinct fashion from the base image) - then the connection with the above bootc proposal is I can choose to push that filesystem tree (layer) to a registry too - versioning, mirroring, managing, signing it the same way I do other container content - and moving the "blueprint" -> "filesystem layer" to more of a build step.

I need your help understanding this paragraph. My final proposal was this one:

Call bootc prepare-tree
Create a partitioned disk
Mount it
Copy the file tree into the disk
Call bootc finish-disk

Do you want bootc-image-builder to be able to push the result of the first step 1 as a single layer OCI image? And if customizations are involved, this would become:

Call bootc-prepare-tree
Create an overlayfs over the tree
Perform any customization from a blueprint
Push this tree as two layers

I'm happy to implement this, but I'm not sure about use cases for this workflow. Is this mainly about debugging? It has a potential of introducing more complexity. If I get it right, this would be completely optional, but still - it's kinda hard to explain what the resulting artifact is. I guess we can solve this by explicitly marking this artifact as useful for debugging only.

Do you expect bootc-image-builder to be able to consume such an artifact as an input? Basically:

Pull the container image
Create a partitioned disk
Mount it
Copy the content of the container image into the disk
Call bootc finish-disk

This means that bootc finish-disk needs to do one final round of selinux relabeling, because AFAIK selinux labels aren't available in OCI images. Not a big deal I think, just something we must not forget.

Anyway, I might have completely misunderstood your idea, so feel free to correct me on everything I'm wrong on. :)

Dec 15 '23 09:12 ondrejbudai

Well, if the tree inside the bootable image was the same as in the container image, we would just need to run cp -a instead of bootc. 🙃

True. However, I hope you'd agree that this is again a corner case; < 5% of debugging cases would need to dig into this distinction - the ostree stuff is a background thing. It's a very similar thing to looking at a container image versus how containers/storage represents it on disk in /var/lib/containers.

(But yes in the bootc/ostree case there are some interesting things there like how we set up the /boot filesystem and kernel arguments)

Haven't seen this one before. I agree that this is slightly annoying in osbuild, but it can be solved by the postprocess step that Alex implemented.

(This is somewhat tangential but) another case I just realized will break with this is reflinks; ostree uses them today for /etc (if available as a minor optimization) but we've talked about just using them (if available) across the board as a "resilience against accidental mutation" for the deployment root. But cp -a doesn't "preserve" reflinks in this way - if the source is on a separate filesystem then nothing will be linked, but if they are we'll get two independent files reflinked to the source, not to each other and hence not shared after the cache is deleted.

To be clear this isn't a serious problem today because for the /etc case it will just fix itself on the first upgrade (as ostree takes over and performs the writes) and the sizes are small. But if we did reflinks for the deployment root, that wouldn't be true today (unless we also fix up that in the post-copy bit).

Also backing up closer to the topic here it's notable there isn't a way to represent reflinks in OCI - because they're not represented in tarballs, and tarballs are a "lowest common denominator" thing.

Btw, I'm not fully opposed to just dropping tree-level customizations (=adding users, files, enabling services, ...) from bootc-image-builder.

I'm not saying that at all! I think everyone agrees that we need functionality like this. But, that bootc issue is also arguing to support that step at the bootc install time phase, which is orthogonal to generating a disk image. To elaborate on this if we supported that in addition (not arguing for dropping the ability to inject files at disk image generation time!) then it'd also work the the same way in anaconda.

This means that bootc finish-disk needs to do one final round of selinux relabeling, because AFAIK selinux labels aren't available in OCI images.

This is a messy topic...a lot of related discussion in https://github.com/containers/storage/pull/1608 - and I may have been wrong there actually and we could just write the labels into the OCI archive. I was perhaps too chicken to be sure that'd work across the ecosystem.

Anyways though...hmmm...I would say that "materialize intermediate steps as OCI" is potentially interesting just for introspection/debugging but we shouldn't try to support "push them to a registry" unless the use case becomes obvious. (That said I linked this in a different discussion but I came across https://earthly.dev/ recently which leans heavily into the idea of caching general build artifacts in OCI)

I need your help understanding this paragraph. My final proposal was this one:
1. Call bootc prepare-tree
2. Create a partitioned disk
3. Mount it
4. Copy the file tree into the disk
5. Call bootc finish-disk

(Mechanically let's prefix this with bootc install as this is all sub-functionality of that...hmm, if we're going to grow more here it'd look better as bootc install to-disk and bootc install to-filesystem and then we have bootc install prepare-tree too.)

Hmmmm. So I'd say short term I am not opposed to this proposal and I think we can get your patches in. However...let me try to re-describe how I'm thinking of things.

Actually here's the key bit in what I'm proposing: the file content that osbuild injects is build as a container layer, using the input container as a base image. So the flow would look like:

Take blueprint (or kickstart, or whatever) high level description of extra system state and "build" it. Let's start with a very crude implementation:

FROM <input container image>
COPY osbuild-render-blueprint /tmp
COPY blueprint.json /tmp/
RUN /tmp/osbuild-blueprint-execute /tmp/blueprint.json && rm /tmp/* -rf

We can then take just the final layer from this process (i.e. a tarball) and save it as image-overlay.oci (i.e. just wrap that final layer tarball as its own OCI "image"). This filesystem tree would include things like a modified /etc/passwd and /var/home/someuser/.ssh/authorized_keys.

Now this bit could even happen in parallel

Inspect container image to fetch partitioning information (or use default embedded in the image, or use externally specified partitioning in e.g. anaconda cases)
Create partitioned disk
Mount it

Then finally, we put things together:

bootc install-to-filesystem (and crucially, if available we also pass --with-overlay=oci:/path/to/cache/image-overlay.oci)
Clean things up by unmounting, closing up loopback device if appropriate etc
Perform all final transformations on the target disk image (e.g. convert to VMDK, etc.)

This means that bootc finish-disk needs to do one final round of selinux relabeling, because AFAIK selinux labels aren't available in OCI images. Not a big deal I think, just something we must not forget.

So in my proposal (bootc install --with-overlay), it'd probably be bootc which does the SELinux labeling on the final filesystem tree, as it does for the container image it takes as input.

Dec 15 '23 14:12 cgwalters

BTW I just put up https://github.com/containers/bootc/pull/226 which will clear the way for having other bootc install sub-commands.

Dec 15 '23 15:12 cgwalters

bootc vol.2

Thanks, Colin, this makes much more sense than my interpretation.

I think that your idea might actually play very well with both bootc and osbuild. Let me present how I understand the flow in a pseudo-bash script. In the end, this would be a single osbuild manifest, but let's make it high-level for now.

SOURCE=quay.io/centos-bootc

# "Deploy" the container
$container=$(podman image mount $SOURCE)
cp -a $container/. /tmp/container

# Build the overlay OCI image
mkdir /tmp/osbuild-customizations
mount -t overlayfs overlay -olowerdir=/tmp/container,upperdir=/tmp/osbuild-customizations /tmp/merged
osbuild-apply-customizations /tmp/blueprint.toml /tmp/merged
umount /tmp/merged
osbuild-create-a-single-layer-oci-archive /tmp/osbuild-customizations /tmp/container/overlay.tar

# Partition a disk
truncate -s 10G /tmp/disk
fdisk [...] /tmp/disk
losetup -Pf /tmp/disk
mkfs.* /dev/loop0p{1,2,3}
mount /dev/loop0p{1,2,3} /tmp/container/tree{/,/boot,/boot/efi}

# Fetch the container image
skopeo copy docker://$SOURCE oci-archive:///tmp/container/container.tar

# Bootc install (this is of course bubblewrap in osbuild, but let's keep it simple)
chroot /tmp/container \
  bootc install to-filesystem \
    --source oci:archive:///container.tar \
    --with-overlay oci:archive:///overlay.tar \
    --generic-image --[...]
    /tree

# Umount and unloop everything
umount /tmp/tree**
losetup -D /dev/loop0*

# Convert to qcow2
qemu-img convert -f qcow2 -O /tmp/final-image.qcow2 /tmp/disk

I think we need the following things for this happen:

bootc gains the --source argument (and some minor compatibility patches, see my branch)
bootc gains the --with-overlay argument
osbuild gains the podman image mount capabilities
osbuild gains the capabilities to work with overlayfs
osbuild gains the capabilities to run bootc install to-filesystem
bootc gets bootc-image-builder running in its CI (rev-dep tests)

Our team does 3, 4, 5, @cgwalters does 2, 1 & 6 are a shared effort, I think we can write the code.

Does this sound plausible? Can we commit to this?

Note that you also wrote about the capability of splitting build process into the two steps: Firstly an overlay is built and pushed into a registry. Then, someone else pulls the overlay and applies it. I think that's absolutely doable, but I would focus on the proposed flow firstly, because it's just a single build, thus simpler. I'm definitely happy to work on splitting the thing (optionally) afterwards.

Dec 20 '23 10:12 ondrejbudai

using bootc install-to-filesystem

bootc vol.2