storage icon indicating copy to clipboard operation
storage copied to clipboard

overlay: support AdditionalImagesStore on vfs

Open giuseppe opened this issue 1 year ago • 11 comments

extend the overlay driver to use an additional image store on a vfs backend.

It works as overlay can simply use the upper most vfs layer as the only lower layer for the container.

It is useful on a system already protected by composefs, to distribute images as part of the operating system itself. The read-only store cannot be on overlay since composefs itself is using overlay, and it causes conflicts with whiteout files

giuseppe avatar Apr 22 '24 12:04 giuseppe

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Apr 22 '24 12:04 openshift-ci[bot]

But what about access to the parent layers, e.g. due to layer deduplication? Would we be shipping OS images with a baked-in VFS store, where a 10-layer image contains 10 copies of the same file? (10 real copies? 10 hard-linked copies? 10 ref-linked copies?)

with composefs it won't matter, because these files will be deduplicated anyway. So having multiple copies of the same file is fine, as the underlying storage will keep only one copy of that file.

This feature is only useful on a composefs system, where vfs doesn't have any additional cost compared to overlay since the deduplication is performed in the layer below.

giuseppe avatar Apr 23 '24 09:04 giuseppe

But what about access to the parent layers, e.g. due to layer deduplication? Would we be shipping OS images with a baked-in VFS store, where a 10-layer image contains 10 copies of the same file? (10 real copies? 10 hard-linked copies? 10 ref-linked copies?)

The alternative of not shipping the parent layers at all seems impractical to me, that would break quite a lot of assumptions. Or would we restrict this feature to squashed images where the difference doesn’t matter?

I think there is maybe something to this worry, but its not quite what you seem to say here.

First, lets start with the goal of this: We want to enable a bootc image that contains other containers, such that when you boot the container as a fullblown OS it can run containers. In such a deployment, the outer image would be using ostree with composefs, and the vfs storage directory would be part of this composefs image.

The composefs image will naturally and automatically de-duplicate any identical files. So, shipping all the layers as vfs stores will not be a problem in the final image.

That said, wouldn't we still de duplicating the files in the OCI layer tarballs, making e.g. downloads larger?

alexlarsson avatar Apr 23 '24 09:04 alexlarsson

Or would zstd-chunked help here?

alexlarsson avatar Apr 23 '24 09:04 alexlarsson

That said, wouldn't we still de duplicating the files in the OCI layer tarballs, making e.g. downloads larger?

the downside is when you pull an image to vfs, it first needs to copy all the underlying layers, and then applies the current one. zstd:chunked is not implemented for vfs (not sure it is worth the effort), but once you've the store and you've added it to a composefs image, then there are no additional costs of vfs (except the # of inodes)

giuseppe avatar Apr 23 '24 10:04 giuseppe

But running this image as a container could be huge. As long as users of this feature understand the downsides, it should not be a problem.

Doing a bootc upgrade with one of these images would be able to take advantage of zstd:chunked though, correct?

rhatdan avatar Apr 23 '24 10:04 rhatdan

That said, wouldn't we still de duplicating the files in the OCI layer tarballs, making e.g. downloads larger?

the downside is when you pull an image to vfs, it first needs to copy all the underlying layers, and then applies the current one. zstd:chunked is not implemented for vfs (not sure it is worth the effort), but once you've the store and you've added it to a composefs image, then there are no additional costs of vfs (except the # of inodes)

That is not exactly what I mean though. Say you have a Dockerfile that creates a bootc container, and it does RUN podman pull --vfs --ais=/usr/lib/my-images some-image. Now, supposed some-image has several layers, and these are put into the container image. Wouldn't the contents of these layers be duplicated in the resulting bootc OCI image tarball, for example when you pull it to you bootc host.

alexlarsson avatar Apr 23 '24 11:04 alexlarsson

Doing a bootc upgrade with one of these images would be able to take advantage of zstd:chunked though, correct?

Technically it should be possible. For example, if a chunked image contains 10 copies of the same data in different files, then we should be able to only download it once. I don't know if the current implementation works this way though, as it is probably focusing on the slightly different case where a single file in the image is already locally available.

alexlarsson avatar Apr 23 '24 11:04 alexlarsson

That said, wouldn't we still de duplicating the files in the OCI layer tarballs, making e.g. downloads larger?

the downside is when you pull an image to vfs, it first needs to copy all the underlying layers, and then applies the current one. zstd:chunked is not implemented for vfs (not sure it is worth the effort), but once you've the store and you've added it to a composefs image, then there are no additional costs of vfs (except the # of inodes)

That is not exactly what I mean though. Say you have a Dockerfile that creates a bootc container, and it does RUN podman pull --vfs --ais=/usr/lib/my-images some-image. Now, supposed some-image has several layers, and these are put into the container image. Wouldn't the contents of these layers be duplicated in the resulting bootc OCI image tarball, for example when you pull it to you bootc host.

yes, in this case all the layers will be stored as exploded inside the container image itself

giuseppe avatar Apr 23 '24 14:04 giuseppe

That said, wouldn't we still de duplicating the files in the OCI layer tarballs, making e.g. downloads larger?

the downside is when you pull an image to vfs, it first needs to copy all the underlying layers, and then applies the current one.

I’m not quite clear about the overall picture, still: Under $assumptions, once layers 1…M exist in all stores backing a single c/storage.Store, a pull of a child image with layers 1…M…N should see that all the parent layers are already available locally, not pull them, and then tell the top-level graph driver to only create the new layers.

If the primary graph driver is overlay, that should only create overlay layers.

($assumptions are that either no layers are zstd:chunked, or that they are the same exact zstd:chunked TOC in the two images.)

mtrmac avatar Apr 23 '24 21:04 mtrmac

I've addressed some of your comments, but I leave this still marked as Draft as I want to do another pass

giuseppe avatar Apr 24 '24 14:04 giuseppe

ping

giuseppe avatar May 08 '24 15:05 giuseppe

@mtrmac PTAL

rhatdan avatar May 13 '24 15:05 rhatdan

@alexlarsson PTAL

rhatdan avatar May 21 '24 10:05 rhatdan

I would love to have this feature, but I currently don't have time to review this.

alexlarsson avatar May 29 '24 12:05 alexlarsson

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-merge-robot avatar Jun 12 '24 09:06 openshift-merge-robot

I won't deal with rebases trying to keep this PR alive for now. I'll reopen it once it is clear we want it and can get reviewed

giuseppe avatar Jun 12 '24 09:06 giuseppe