buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

Support for exporting nydus compression type

Open imeoer opened this issue 2 years ago • 24 comments

Related issue: https://github.com/moby/buildkit/issues/2046

Nydus image is a container accelerated image format provided by the Dragonfly image-service project, which offers the ability to pull image data on demand, without waiting for the entire image pull to complete and then start the container. It has been put in production usage and shown vast improvements over the OCI v1 image format in terms of container launching speed, image space, and network bandwidth efficiency, as well as data integrity. Nydus image can be flexibly configured as a FUSE-based user-space filesystem or in-kernel FS with nydus daemon in user-space:

  • Lightweight integration with VM-based containers runtime like KataContainers. KataContainers is supported Nydus as a native image acceleration solution.
  • Nydus closely cooperates with Linux in-kernel disk filesystem containers' rootfs can directly be set up by EROFS over Fscache with lazy pulling capability. The corresponding changes had been merged into Linux kernel since v5.19.

Nydus has provided a conversion tool Nydusify for converting OCIv1 image to Nydus image and integrated into Harbor Acceld as a conversion driver, which assumes that the OCI image is already available in the registry, but a better way would be to build the Nydus images directly from the build system instead of using the conversion tool, which would increase the speed of the image export, so we experimentally integrated the Nydus export in Buildkit.

image

A manifest example of nydus image can be found here. Unlike other compression formats (gzip, zstd, eStargz, etc.) in OCI image, nydus is divided into two types of layer, blob and bootstrap, where blob serves as the data and metadata for all files of each layer of the image, and bootstrap serves as the metadata of the whole image, the bootstrap is equivalent to the view of the whole image filesystem after all layers overlay, nydus use this layer form to reduce the number of HTTP request and FUSE overlay's overhead on preparing the container's rootfs. For example, for an OCI image with 3 layers, the corresponding nydus image is 4 layers (3 layers of blob + 1 layer of bootstrap).

The converter package in nydus-snapshotter do the actual layer compression, this PR imports the package to implement the export of nydus compression type.

imeoer avatar Jan 26 '22 10:01 imeoer

@ktock @AkihiroSuda PTAL

imeoer avatar Feb 07 '22 07:02 imeoer

BuildKit doesn't seem to be able to use the exported nydus-compressed image and cache. e.g. the second build in the following with inline cache panics.

# mkdir -p /tmp/ctx
cat <<EOF > /tmp/ctx/Dockerfile
FROM registry2-buildkit:5000/ubuntu:20.04
RUN echo hello > /hello
EOF
# buildctl build --progress=plain --frontend=dockerfile.v0 --local context=/tmp/ctx --local dockerfile=/tmp/ctx \
                 --output type=image,name=registry2-buildkit:5000/ubuntu:20.04-hello,push=true,compression=nydus,oci-mediatypes=true \
                 --export-cache type=inline
# buildctl build --progress=plain --frontend=dockerfile.v0 --local context=/tmp/ctx --local dockerfile=/tmp/ctx \
                 --output type=image,name=registry2-buildkit:5000/ubuntu:20.04-hello-zstd,push=true,compression=zstd,oci-mediatypes=true \
                 --import-cache type=registry,ref=registry2-buildkit:5000/ubuntu:20.04-hello
...
panic: runtime error: index out of range [4] with length 4

goroutine 748 [running]:
github.com/moby/buildkit/cache/remotecache.(*contentCacheImporter).importInlineCache.func1.1()
	/go/src/github.com/moby/buildkit/cache/remotecache/import.go:188 +0xda5
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/src/github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup/errgroup.go:57 +0x67
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/src/github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup/errgroup.go:54 +0x92

ktock avatar Feb 08 '22 04:02 ktock

BuildKit doesn't seem to be able to use the exported nydus-compressed image and cache. e.g. the second build in the following with inline cache panics.

@ktock Thanks for the testing, the panic indeed occurs.

This seems to be a tricky problem. Nydus image consists of multiple blobs and a bootstrap (one more layer than the cache records), and the blob does not correspond to the cache record one by one (probably fewer layers than the cache record).

Is it possible to consider not supporting the inline cache export for nydus image? Will submit a new commit to do this.

imeoer avatar Feb 08 '22 10:02 imeoer

@ktock rebased code to resolve the conflicts with the PR and add workaround for inline export on the commit. PTAL.

imeoer avatar Feb 17 '22 04:02 imeoer

I wonder if we could take inspiration from the idea of frontends and implement something similar for exporting. I.e. export types could be implemented as software in container images, which have access to an API that gives them some access to the Buildkit worker's cache/content-store (with guardrails of course) and allows them to handle turning layer blobs into whatever format they want. Then, when users want to do an export, they would just specify the exporter they want to use as a container image reference.

Thanks @sipsma for the reply. The initial version of supporting nydus compression type is introducing a new exporter, but it is a heavy modification and can't reuse some codes (manifest and cache export).

Maybe nydus cannot be called a compression type, it consists of multiple blob layers (including the addition or modification data of upper layer. no deletion data, so it doesn't care about whiteout file) and one bootstrap layer (the whole metadata of filesystem view for entire image). nydus use this layer form to reduce the number of HTTP request and FUSE overlay's overhead on preparing the container's rootfs.

But current workflow on handling compression type is hard to compatible with this layer form in the cache package, so the PR adds some condition branch (I admit that has a bit dirty) to handle cache. So I agree that would be easier to handle turning layer blobs into whatever format they want if buildkit provides more opening access ability of cache/content-store and even more manifest handling.

Before any change occurs mentioned above, I will continue to do the workaround (WIP) to reuse computeBlobChain, will have some changes on nydus format like this:

  • Every nydus blob layer represents the file metadata and data of current layer (output by computeBlobChain).
  • Appending an extra nydus bootstrap layer to the manifest, represents the whole metadata of filesystem view for the entire image.

Compared with the current implementation of the PR, the workaround will reduce the modification for special handling on nydus format export. cc @ktock @sipsma

imeoer avatar Mar 15 '22 03:03 imeoer

@ktock @sipsma Refactored the PR code, PTAL. I will add integration tests for the nydus compression type later, once the implementation generally looks ok for you, thanks!

imeoer avatar Apr 08 '22 02:04 imeoer

Please also add unit tests to manager_test.go not only integration tests.

@ktock Got it, will add unit and integration tests after all reviews are resolved.

imeoer avatar Apr 12 '22 07:04 imeoer

We have used nydus to accelerate the container start, but building nydus image is slow because it need to build oci image first and then convert to nydus image. So we need build nydus image by dockerfile directly to cut down the build time. We have tested this PR and can meet our need, We hope it can be upstream as soon as possible. Thanks.

luodw avatar Apr 20 '22 07:04 luodw

@ktock @sipsma Added unit and integration tests, please continue looking at this, thanks!

imeoer avatar Apr 22 '22 02:04 imeoer

@ktock @sipsma There are some updates, please help continue to look at this PR, thanks!

imeoer avatar May 05 '22 02:05 imeoer

The CI is broken due to a cross-compilation issue, will fix it later.

imeoer avatar May 11 '22 06:05 imeoer

Updated: fixed the CI cross build issue. cc @sipsma @ktock @tonistiigi

imeoer avatar May 11 '22 08:05 imeoer

A remaining concern is how can we maintain it because there are many exception logic for nydus and seems no tests for checking if the exported blobs are valid and runnable. So I'm worried about we'll accidentally produce broken images without noticing it.

@ktock Thanks for the reply, it looks like the main problem with these comments above is that nydus does not implement conversion from nydus blob to tar stream, we are considering implementing this part and will update this PR later.

imeoer avatar May 20 '22 10:05 imeoer

Hi @ktock, the PR has been updated, and now the nydus blob supported decompression, so most of the exception logic has been removed, also the limitation of remote cache and inline cache has been removed, please help to take a look again, thanks.

imeoer avatar Jul 08 '22 08:07 imeoer

Ping @ktock, PTAL again.

imeoer avatar Jul 28 '22 04:07 imeoer

Another idea for moving this integration forward: https://github.com/moby/buildkit/issues/3037

ktock avatar Aug 17 '22 09:08 ktock

Another idea for moving this integration forward: https://github.com/moby/buildkit/issues/3037

@ktock Good idea, this proposal might make cleaner for the job of appending an extra bootstrap layer to nydus image, is it a blocker for this PR? Does the cache changes in this PR still available? Maybe we can move this change forward first, and then use the exporter plugin to refactor.

imeoer avatar Aug 17 '22 14:08 imeoer

These are the points that I think are different between nydus and other compression types involved in this PR. Hope we can continue to give some opinions on whether these will block moving on. cc @AkihiroSuda @ktock

  1. Nydus compression type can't be mixed with other compression types in the same image, so if cache record is other/nydus layer, but the target is another compression type, need to convert forcibly (force-compression=true).

  2. Need to append an extra bootstrap layer to the manifest of nydus image, this layer represents the whole metadata of filesystem view for the entire image, this requires skipping the layer when importing inline cache.

  3. Nydus is not an archive format, but a filesystem format, if we convert a system tar to nydus and then convert nydus back to tar, although nydus can guarantee filesystem consistency, but it can’t do a complete restore of the system tar, it will change its file entry path, for example, ./foo1 -> foo1 (it lost ./ for file path in the tar header), but nydus makes sure that the decompressed tar is as consistent as possible with the hash of OCI tar format (in go implementation), and does strictly guarantee in filesystem consistency.

imeoer avatar Aug 24 '22 14:08 imeoer

ping @AkihiroSuda @ktock @tonistiigi ~

imeoer avatar Aug 30 '22 01:08 imeoer

Yes, I agree that they are blockers. And another consideration is how can we maintain it (e.g. how can we guarantee the emitted nydus isn't broken and maintain all nydus-related exceptions). https://github.com/moby/buildkit/issues/3037 is another approach for solving these issues. https://github.com/moby/buildkit/pull/2045 looks like integrates nydus as an exporter implementation so I thought that https://github.com/moby/buildkit/issues/3037 helps that approach moving forward.

ktock avatar Aug 30 '22 12:08 ktock

@AkihiroSuda @ktock @imeoer @bergwolf had a meeting discussing this PR and reached the following consensus:

  1. We can live with the current three Nydus exceptions in the PR, with proper unit tests to help validate them.
  2. We will add go -tags to mark out nydus code to allow non-nydus build (which is the default build mode).
  3. Meanwhile, @ktock and @imeoer will work on an exporter plugin framework to support compression types like Nydus.
  4. The Nydus team commit to helping maintain Nydus functionality in buildkit, including incremental improvements and bugfixes.

@tonistiigi what do you think of the above? If it sounds ok to you, we'll proceed with the change.

bergwolf avatar Sep 02 '22 06:09 bergwolf

(Follow-up: the go tag is the short-term solution; the long term solution is the exporter plugin)

AkihiroSuda avatar Sep 02 '22 07:09 AkihiroSuda

what do you think of the above?

sgtm. Make sure that as much code as possible is in the gotag files and that there aren't many exceptions in the shared files.

tonistiigi avatar Sep 05 '22 00:09 tonistiigi

cc @AkihiroSuda @ktock @tonistiigi @bergwolf

Hi, we have moved the nydus codes to the gotag files, please take a look, the following points need to be clarified:

  1. the nydus features and related tests will only be run when -tags=nydus is enabled. the unit test for system tar handling by the nydus compression type will still be skipped in manager_test.go, and this part of the code can be removed when migrating to the future exporter plugin.

  2. because the nydus image has an additional bootstrap layer, in order to avoid exception changes to the remote cache, nydus does not support cache import/export (maybe we can support it in exporter plugin), the relevant limitations have been added to nydus.md.

imeoer avatar Sep 07 '22 11:09 imeoer

@AkihiroSuda @ktock @tonistiigi With #3136 merged, this PR is pretty clean in terms of code invasion in the general code path. Could you give it another spin and see if it is suitable for merging? As discussed earlier, we'll continue improving the nydus functionality in buildkit and we look forward to putting it into large-scale production use.

bergwolf avatar Oct 27 '22 08:10 bergwolf

@ktock @AkihiroSuda Updated, please help to take a glance again, thanks!

imeoer avatar Nov 03 '22 04:11 imeoer

Thanks for all! cc @tonistiigi Please help to take a final look, thanks!

imeoer avatar Nov 07 '22 05:11 imeoer

Thanks all for considering Nydus!

hsiangkao avatar Nov 08 '22 06:11 hsiangkao