bootc icon indicating copy to clipboard operation
bootc copied to clipboard

Partitioning Support

Open mripard opened this issue 11 months ago • 18 comments
trafficstars

Hi,

I've discussed this with @cgwalters and on Matrix before, but I've been trying to leverage containers to create bootable images for ARM, and more generally, any platform that doesn't have a standardized boot setup.

I wrote a blog post for the larger context here.

As a prototype, I wrote ocibootstrap, that now supports most ARM boards setup I came across, and actually support systems from five different vendors (ARM, Amlogic, NXP, Rockchip, TI).

The basic idea is that the container would hold everything the system needs to boot: kernel, bootloader, user-space, etc. plus some metadata that describes how the partitioning should work. ocibootstrap then uses those metadata to create a partition table and copies the container content into a disk image.

I believe it's something that should be useful for bootc too, and there's a lot of overlap anyway so it would at least make sense to share our effort.

In addition to defining partitions and their mount-points, the additional requirements I came across are:

  • MBR Support
    • Ability to set the payload, and with an offset (Amlogic)
    • Ability to define partitions with a raw content, and at a given offset (Amlogic)
  • GPT Support
    • Ability to change the maximum number of partitions (Allwinner)
    • Ability to define partitions with a raw content, and at a given offset (NXP, Rockchip)

ocibootstrap has a set of metadata and the code to deal with all this. It has a set of design goals that might or might not be compatible with bootc though:

  • Fire-and-Forget thing: everything should be contained in the container, the tool itself should work by default, without any particular option.
  • Cross-platform support: we need to be able to create an image for, say, an ARM platform from an x86 machine.

At the very least, I think it would be worthwhile to share our efforts on a common partitioning layout definition. Let me know what you think.

mripard avatar Dec 02 '24 16:12 mripard

Heya!

There's definitely a ton of overlap between ocibootstrap and bootc-image-builder (bib) that I'm working on. BIB is already capable of doing cross-arch builds and distributed as a container image.

With regards to partitioning: bootc-image-builder just merged a new partitioning schema that should be much more expandable than the previous one. We now have full support for LVM, btrfs and swap partitions. The next goals is to support MBR vs. GPT and custom partition GUIDs/types in order to fix Raspberry Pi. What's also nice is that is same code is also used to produce package mode centos/RHEL images, so the code is well tested. :)

So I think that we can definitely collaborate on adding support for the extra features that other ARM boards need.

Also, reading the blueprint (our configuration format) from the container image itself is something that is also on our long-term plan.

ondrejbudai avatar Dec 02 '24 16:12 ondrejbudai

bootc itself is pretty unopinionated on the partition layouts; the primary target we maintain here is bootc install to-filesystem which supports writing to an extant root filesystem.

However there is also bootc install to-disk which contains a very simple hardcoded partitioning default; but it's intended more as a "demo".

Also, reading the blueprint (our configuration format) from the container image itself is something that is also on our long-term plan.

Right. However, there is still a very important thing looming over this, which is whether an install is done by "dd raw disk image" or is "run a container image" (or is "run a live environment like an ISO which can run code to do an install"). In some cases the latter is very desirable...and takes it currently out of what bootc-image-builder is scoped to do today.

One thing I'd like to do to enable the latter is making it easier to "hook" or fully replace what bootc install to-disk does. That could be backed by whatever (from shell script in the image to partitioning pulled from manifest annotations or embedded blueprints/kickstarts/systemd-repart or whatever). But, that's in the end just a slightly more obvious way to ship a custom installer embedded as part of the container that ends up running bootc install to-filesystem as a middle or end phase.

(Quoting the blog post now)

We found that the partition layout is platform-specific, and that we expect to have a container for each platform or board. This means that we can encode the partition table along with the container.

In your conception of ocibootstrap, are updates thereafter done by e.g. "apt/dnf update", i.e. the container image is a one-time mechanism?

That would make it quite different from bootc today, but indeed we should ideally aim to share code and ideas I think.

I'm not too familiar with the details of some of the smaller ARM devices you mention in the blog; one thing we do depend on here is bootupd - see specifically https://github.com/coreos/bootupd/issues/432 which touches on the generic boot loader bits. I'd also like to make that part more hookable/customizable.

Overall if my understanding of ocibootstrap being a "one time use" thing is correct, then that's by far the biggest axis that splits it versus bootc. In bootc (and bootupd) it's all about making sure that "day 1" and "day 2" are as symmetric as possible to support in-place transactional updates and rollbacks.

cgwalters avatar Dec 02 '24 21:12 cgwalters

With regards to partitioning: bootc-image-builder just merged a new partitioning schema that should be much more expandable than the previous one. We now have full support for LVM, btrfs and swap partitions. The next goals is to support MBR vs. GPT and custom partition GUIDs/types in order to fix Raspberry Pi.

Note that, as far as partitioning is concerned, the RaspberryPi is pretty simple. So it's not really the end of the road, but merely the beginning :)

So I think that we can definitely collaborate on adding support for the extra features that other ARM boards need.

Also, reading the blueprint (our configuration format) from the container image itself is something that is also on our long-term plan.

We were discussing it with @alexlarsson by mail too, and the more I think about it, the more I think it would be better to use a JSON (or similar) format attached to a label. That way, we could define the format somewhere, publish a schema, and we could all use the same format making all these tools interoperable. I'm ready to start working on a first proposal to merge both bib and ocibootstrap requirements. Would you be interested?

Also, reading the blueprint (our configuration format) from the container image itself is something that is also on our long-term plan.

Right. However, there is still a very important thing looming over this, which is whether an install is done by "dd raw disk image" or is "run a container image" (or is "run a live environment like an ISO which can run code to do an install"). In some cases the latter is very desirable...and takes it currently out of what bootc-image-builder is scoped to do today.

The former is what I'm mostly interested in, and what ocibootstrap tries to address.

(Quoting the blog post now)

We found that the partition layout is platform-specific, and that we expect to have a container for each platform or board. This means that we can encode the partition table along with the container.

In your conception of ocibootstrap, are updates thereafter done by e.g. "apt/dnf update", i.e. the container image is a one-time mechanism?

At least, it's something I really want to support. There's platforms out there that don't have the resources (or use-cases) to support anything else, and I believe we should have a solution for them still.

That would make it quite different from bootc today, but indeed we should ideally aim to share code and ideas I think.

See my question above, but I really think the partitioning description should be fairly easy to share and would be useful, no matter if we want to use ostree or a package manager for example.

I'm not too familiar with the details of some of the smaller ARM devices you mention in the blog; one thing we do depend on here is bootupd - see specifically coreos/bootupd#432 which touches on the generic boot loader bits. I'd also like to make that part more hookable/customizable.

The main issue I can see with bootupd (but I don't really know its internal, just uses it since last month or so, so I might be wrong) is where the bootloader is coming from. My understanding is that it's from a package, and bootupd then runs during the post-install hook. I think it's working fine for cases like x86 where the bootloader is generic across the whole architecture. However, on ARM systems, the bootloader will be different from one board to the other, even if they share their SoC. So the challenge now comes from packaging and distributing all those images. It would create a pretty big maintenance burden, and I'm not sure it's something any distro will want to do.

Overall if my understanding of ocibootstrap being a "one time use" thing is correct, then that's by far the biggest axis that splits it versus bootc. In bootc (and bootupd) it's all about making sure that "day 1" and "day 2" are as symmetric as possible to support in-place transactional updates and rollbacks.

It's definitely a one time use so far.

mripard avatar Dec 03 '24 17:12 mripard

The main issue I can see with bootupd (but I don't really know its internal, just uses it since last month or so, so I might be wrong) is where the bootloader is coming from. My understanding is that it's from a package, and bootupd then runs during the post-install hook. I think it's working fine for cases like x86 where the bootloader is generic across the whole architecture. However, on ARM systems, the bootloader will be different from one board to the other, even if they share their SoC. So the challenge now comes from packaging and distributing all those images. It would create a pretty big maintenance burden, and I'm not sure it's something any distro will want to do.

I want to make sure bootupd is cleanly extendable to these things too, it shouldn't require packages. IOW it should be able to install content added in a container build. See https://github.com/coreos/bootupd/issues/766

cgwalters avatar Dec 03 '24 21:12 cgwalters

We were discussing it with @alexlarsson by mail too, and the more I think about it, the more I think it would be better to use a JSON (or similar) format attached to a label. That way, we could define the format somewhere, publish a schema, and we could all use the same format making all these tools interoperable. I'm ready to start working on a first proposal to merge both bib and ocibootstrap requirements. Would you be interested?

It's clearly relevant and related. But I will continue to use "can configure LUKS for the rootfs" as a very simple differentiator between "toy" and "maybe usable for enterprise". AFAIK, your proposed schemas don't attempt to do LUKS? Neither do blueprints today. But kickstart, Ignition and systemd-repart do. Also of note is that cases like LUKS with tpm2 binding actually need code execution on the target.

In general personally like to see "us" building bridges and integration with these existing tools (all of them are quite relevant) as opposed to inventing a new schema.

All of these have various tradeoffs (you already ran into repart only doing GPT, Ignition is really designed only to run in the initramfs today, kickstart is...battle tested but very tied to Anaconda which is a huge project).

Also of note though is there's already a bootc install config; I think at this point I don't want to try growing it too much and I think we should invest in those bridges, but just noting it exists too.

cgwalters avatar Dec 03 '24 21:12 cgwalters

We were discussing it with @alexlarsson by mail too, and the more I think about it, the more I think it would be better to use a JSON (or similar) format attached to a label. That way, we could define the format somewhere, publish a schema, and we could all use the same format making all these tools interoperable. I'm ready to start working on a first proposal to merge both bib and ocibootstrap requirements. Would you be interested?

I'm not convinced by labels. We were thinking about them too, but ultimately decided to just read files from container images if we need more metadata for disk image builds. People seem to be more used to putting files into containers, rather than using labels. Moreover, files can be shared across multiple Containerfiles, but I don't think that's possible with labels.

I'm happy to collaborate on requirements, but I'm afraid that we are not interested in implement support for yet another schema in bootc-image-builder, especially right after we just defined a new schema. You can review it here: https://osbuild.org/docs/user-guide/partitioning#using-disk-customizations

We were discussing it with @alexlarsson by mail too, and the more I think about it, the more I think it would be better to use a JSON (or similar) format attached to a label. That way, we could define the format somewhere, publish a schema, and we could all use the same format making all these tools interoperable. I'm ready to start working on a first proposal to merge both bib and ocibootstrap requirements. Would you be interested?

It's clearly relevant and related. But I will continue to use "can configure LUKS for the rootfs" as a very simple differentiator between "toy" and "maybe usable for enterprise". AFAIK, your proposed schemas don't attempt to do LUKS? Neither do blueprints today. But kickstart, Ignition and systemd-repart do. Also of note is that cases like LUKS with tpm2 binding actually need code execution on the target.

FTR, blueprints don't support LUKS because no one asked for them. LVM on the other hand is and always was a highly requested feature.

ondrejbudai avatar Dec 05 '24 17:12 ondrejbudai

FTR, blueprints don't support LUKS because no one asked for them.

Okay, asked for in https://github.com/osbuild/bootc-image-builder/issues/747 then

cgwalters avatar Dec 05 '24 19:12 cgwalters

We were discussing it with @alexlarsson by mail too, and the more I think about it, the more I think it would be better to use a JSON (or similar) format attached to a label. That way, we could define the format somewhere, publish a schema, and we could all use the same format making all these tools interoperable. I'm ready to start working on a first proposal to merge both bib and ocibootstrap requirements. Would you be interested?

It's clearly relevant and related. But I will continue to use "can configure LUKS for the rootfs" as a very simple differentiator between "toy" and "maybe usable for enterprise".

I'm not sure this is the right way to frame it. There's plenty of systems deployed out there that wouldn't qualify for "entreprise-grade" following that logic. Not all systems can, or require, the use of secure-boot, TPMs, LUKS, or whatever. Yet, making sure they belong somewhere in the ecosystem and are not an afterthought is valuable. At the moment, we're having a hard time building an "enterprise" system based on a platform family that 10 years ago would have qualified as a toy platform, with a toy system.

We might not want to put our time or resources behind it, and that's totally reasonable. Qualifying them as toys isn't great though.

AFAIK, your proposed schemas don't attempt to do LUKS?

I haven't proposed anything yet :)

But yeah, I don't have a lot of knowledge about LUKS in general so I might not be the best candidate to make a proposal that includes LUKS, unless you have some resources on what would qualify as a decent LUKS representation.

I'm all for including LUKS though, it makes ton of sense to support it.

Neither do blueprints today. But kickstart, Ignition and systemd-repart do. Also of note is that cases like LUKS with tpm2 binding actually need code execution on the target.

In general personally like to see "us" building bridges and integration with these existing tools (all of them are quite relevant) as opposed to inventing a new schema.

All of these have various tradeoffs (you already ran into repart only doing GPT, Ignition is really designed only to run in the initramfs today, kickstart is...battle tested but very tied to Anaconda which is a huge project).

I already had a look at supporting MBR in systemd-repart. I think eventually it's something I'd like to happen (if only to resize partitions at first boot) so I might work on it.

We were discussing it with @alexlarsson by mail too, and the more I think about it, the more I think it would be better to use a JSON (or similar) format attached to a label. That way, we could define the format somewhere, publish a schema, and we could all use the same format making all these tools interoperable. I'm ready to start working on a first proposal to merge both bib and ocibootstrap requirements. Would you be interested?

I'm not convinced by labels. We were thinking about them too, but ultimately decided to just read files from container images if we need more metadata for disk image builds. People seem to be more used to putting files into containers, rather than using labels.

Labels have a strong advantage for that work though: you don't need to extract the container to access them, unlike files. Using a file, you need to extract it, read the file, create the partition table, copy/extract the container to the target. With a label, you can create the partition table and extract the container to the target in one go.

Also, I wonder what would be the interaction between the partition table being stored in the container itself, and the user can be expected to modify, and a signed container.

Moreover, files can be shared across multiple Containerfiles, but I don't think that's possible with labels.

Labels can be provided on the command-line when building the container. It's slightly less convenient when generating the container, but you can still share these between containers or create them from a template when building the container.

mripard avatar Dec 18 '24 09:12 mripard

It's been a while, sorry, but I've finally been able to make progress on this.

I've created a branch for ocibootstrap that does what we discussed here. It's working based on a schema defined here. I also added multiple tests showing the various cases supported so far: mbr, gpt and lvm.

I've also built some containers with the proper partition table and confirmed that it works there. If you want to have a look at the containers themselves, you'll find them as build artifacts here.

Even though we discussed it previously, I didn't add the LUKS support yet. I don't know much about it, so I didn't feel confident with coming up with a set of reasonable parameters, but if you can point me to a similar description and/or a description of what would be needed, I can work on it.

mripard avatar Feb 19 '25 09:02 mripard

Ok, so I'm looking at this again from the perspective of building OCI images with automotive-image-builder, and then generating a physical image from bc-i-b.

Currently, what is available in this area is:

In bootc, there is a concept of install files (in /usr/lib/bootc/install/00-*.toml) inside the OCI image. These are read by bootc install print-configuration that bib calls and is used by bib when it determines how to set up the partition table. The amount of configuration currently available in these is quite limited atm, in fact bc-image-builder only looks at it for the default rootfs filesystem type.

In bc-i-b there is support for customization of the partitioning scheme and mountpoints. This allows specifying extra partitions (like a separate /var), as well as LVM style setups. It also lets you set custom labels and mountpoints for the partitions. However, the customization data doesn't come from the OCI image itself, but must instead be supplied from the outside.

Then there is @mripard:s work on ocibootstrap, which uses a json file to describe the partition format needed. This is much more detailed, and basically records a physical partition table. This is an external tool that is not currently used in the bootc echo system.

Over to the requirements for automotive:

We use transient /etc, which means we cannot rely on mount files (etc) created by bib, instead we must ship mount files in the OCI image, and they must work in the physical image. This generally means the mount file has to reference the filesystem by one of: partition label, partition uuid, fs uuid, fs label (for boot performance reasons we prefer partition info over the filesystem info). So, we must be able to tell bc-i-b what to use for one of these.

In addition to the regular partitions (e.g. /, /boot, /boot/efi) we want the ability to request custom partitions, typically used for /var or /var/qm. These must allow specifying enough for the mount file to work, as well as some details about the fs, like: fs type, fs option (enable fsverity), and maybe a minimal size.

Then we need the ability to have custom non-filesystem partitions that are needed for booting. The most common boot setups we will see are aboot and ukiboot, which require creating a few partitions with custom labels, uuids, type, flags and sizes. We also need the ability to request that a file from the image is written into these partitions at install time.

It seems to me that the first two of these are doable with the blueprint format, if we allow the image itself to ship a default blueprint. And the last part can be done with minor extensions to the blueprint format. Does that seem like a workable approach?

alexlarsson avatar May 14 '25 09:05 alexlarsson

Right now the bootc project does not have an opinion on the mechanism to define partitioning layout. The technical focus of this project is container images and the state we manage is always files in the filesystem to the greatest extent possible. Really the only direct exception is our usage of bootupd (as an external project) which may end up writing (indirectly) to partitions to e.g. just run grub2-install to update the MBR etc.

Then there is @mripard:s work on ocibootstrap, which uses a json file to describe the partition format needed. This is much more detailed, and basically records a physical partition table. This is an external tool that is not currently used in the bootc echo system.

Yeah a lot going on in that project and I'd really like if we aligned more.

I am not opposed to the JSON thing but most use cases to me will involve an "installer" and we already have multiple of those, it's not hard to write one.

It seems to me that the first two of these are doable with the blueprint format, if we allow the image itself to ship a default blueprint. And the last part can be done with minor extensions to the blueprint format. Does that seem like a workable approach?

From my PoV again bootc I don't think should have a hard opinion on this. The bc-i-b project can define where blueprints should be found in the target OS (container image) and read without involving bootc at all. We would want to document it on https://docs.fedoraproject.org/en-US/bootc/ - but that's distinct from this project which doesn't mention kickstarts or blueprints or cloud-init or any of that.

Just to reiterate some of the above though, probably the most important relative newcomer here (ICYMI) is systemd-repart which has the very compelling advantage of adding some "day 2" reconcilation support as well, in addition to already being designed to be embedded in the OS payload. I would be totally in favor of teaching bc-i-b (and probably Anaconda) to learn to consume repart as well.

cgwalters avatar May 14 '25 11:05 cgwalters

This generally means the mount file has to reference the filesystem by one of: partition label, partition uuid, fs uuid, fs label (for boot performance reasons we prefer partition info over the filesystem info).

Do you plan to use the DPS or will you hardcode them? Will each person creating a derived OS choose their own uuids?

cgwalters avatar May 14 '25 13:05 cgwalters

Do you plan to use the DPS or will you hardcode them? Will each person creating a derived OS choose their own uuids?

Honestly, looking at it now, for automotive it probably makes more sense to mount by label instead. I don't see any real reason to involve the uuids at all. Do you think there are any advantages here to caring about the uuids at all?

alexlarsson avatar May 14 '25 13:05 alexlarsson

Honestly, looking at it now, for automotive it probably makes more sense to mount by label instead.

Note per https://www.freedesktop.org/software/systemd/man/latest/systemd-gpt-auto-generator.html# it's supported for bootloaders to set an EFI variable to find the root partition on the same disk that the bootloader used, which I think is a good way to do it. Only as of relatively recently grub implements this too (ref https://github.com/rhboot/grub2/pull/117 )

I think it'd make sense for ukiboot to do the same right? The choice of whether to follow the DPS (fixed UUIDs) or mount by label is then up to higher level code; it's pretty easy in userspace in the initramfs to parse that variable and find the disk and then read its partitions, but systemd-gpt-auto-generator already does that for the DPS UUIDs. I may try changing the (demo) bootc install to-disk flow to use that now that grub is updated with it.

cgwalters avatar May 14 '25 13:05 cgwalters

@cgwalters yes, i added a TODO for that (https://gitlab.com/CentOS/automotive/src/ukiboot/-/issues/1)

alexlarsson avatar May 14 '25 14:05 alexlarsson

Then there is @mripard:s work on ocibootstrap, which uses a json file to describe the partition format needed. This is much more detailed, and basically records a physical partition table. This is an external tool that is not currently used in the bootc echo system.

Yeah a lot going on in that project and I'd really like if we aligned more.

I am not opposed to the JSON thing but most use cases to me will involve an "installer" and we already have multiple of those, it's not hard to write one.

What do you call installer in that context? If it's a software component that runs on the target platform to install the system on it, then we still have the bootstrapping issue. We need a custom partitioning, bootloaders locations, and possibly device trees, for the installer to start in the first place.

It seems to me that the first two of these are doable with the blueprint format, if we allow the image itself to ship a default blueprint. And the last part can be done with minor extensions to the blueprint format. Does that seem like a workable approach?

From my PoV again bootc I don't think should have a hard opinion on this. The bc-i-b project can define where blueprints should be found in the target OS (container image) and read without involving bootc at all. We would want to document it on https://docs.fedoraproject.org/en-US/bootc/ - but that's distinct from this project which doesn't mention kickstarts or blueprints or cloud-init or any of that.

Just to reiterate some of the above though, probably the most important relative newcomer here (ICYMI) is systemd-repart which has the very compelling advantage of adding some "day 2" reconcilation support as well, in addition to already being designed to be embedded in the OS payload. I would be totally in favor of teaching bc-i-b (and probably Anaconda) to learn to consume repart as well.

systemd-repart is definitely nice, but the fact it only works with GPT is a pretty big constraint. Some devices we work on still use MBR, unfortunately.

mripard avatar May 14 '25 15:05 mripard

The bc-i-b project can define where blueprints should be found in the target OS (container image) and read without involving bootc at all. We would want to document it on https://docs.fedoraproject.org/en-US/bootc/ - but that's distinct from this project which doesn't mention kickstarts or blueprints or cloud-init or any of that.

I wrote up an issue for this to hopefully get the ball rolling in BIB: https://issues.redhat.com/browse/HMS-8564

achilleas-k avatar May 19 '25 12:05 achilleas-k

@achilleas-k I'm already working on this. You'll hear from me soon :)

alexlarsson avatar May 19 '25 13:05 alexlarsson

https://github.com/osbuild/bootc-image-builder/pull/932 is now merged, so there is some basic ability of embedding disk customization in the image itself. However, there is also discussions of also supporting the image-builder yaml partitioning format that allows describing the partition layout in more low-level detail.

alexlarsson avatar May 22 '25 14:05 alexlarsson