buildx icon indicating copy to clipboard operation
buildx copied to clipboard

Pushing cache to "registry" cache with multi-node builder only uploads cache from one node

Open potiuk opened this issue 2 years ago • 13 comments

Originally reported at https://github.com/moby/buildkit/issues/2758

It's been confirmed by @tonistiigi that this is a problem with buildx multi-node builder.

When you are building a multi-platform image with multiple builders (to avoid emulation) and use --cache-to type=registry, the resulting registry cache only contains cache for the platform that that was build last.

I tried to utilize buildkit to build Apache Airflow (https://github.com/apache/airflow) multi-platform images. I am using latest buildkit and latest docker:.

Hosts used for the multi-platform builds

I have two builder hosts:

  1. AMD builder (Linux Mint 20.3) with buildx plugin installed github.com/docker/buildx v0.7.1 05846896d149da05f3d6fd1e7770da187b52a247 - docker builder created there

  2. ARM Builder (Mac Pro M1 late 2021) with DockerDesktop 4.6.0 (with buildx pre-install installed) - with new Virtualization framework enabled.

Builder configuration

I configured my buildx builds to use both builders. I connected the MacOS builder to the Linux Host via forwarded docker socket and I am running all my multi-platform builds from the Linux Host.

This is the builders I see with docker buildx ls:

airflow_cache       docker-container                     
  airflow_cache0    unix:///var/run/docker.sock running  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64
  airflow_cache1    tcp://127.0.0.1:2375        running  linux/arm64, linux/amd64, linux/amd64/v2, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

Build command

I want to build a multi-platform image for both ARM and AMD and I want to do it in a single buildx command. Additionally I want to store cache for both platfiorms in the same image but with :cache tag.

My image is multi-staging, so I want to push cache for all stages (hence mode=max)

The (simpliified) command to run the build is:

docker buildx build --progress=default --pull --platform linux/amd64,linux/arm64 \
    --cache-from=ghcr.io/potiuk/airflow/main/ci/python3.7:cache \
    --cache to=type=registry,ref=ghcr.io/potiuk/airflow/main/ci/python3.7:cache,mode=max  \
    --push 
    -t ghcr.io/potiuk/airflow/main/ci/python3.7:latest --target main . -f Dockerfile.ci

While the ghcr.io/potiuk/airflow/main/ci/python3.7:latest image is perfectly fine (nice, multiplatform image), the ghcr.io/potiuk/airflow/main/ci/python3.7:cache image only contains cache by the "LAST" build image - i.e if the AMD image was faster to build and push cache, the cache from the ARM builder pushed later seems to override the AMD cache stored there. I could not find any way to somehow merge those two caches (especially that I cannot specifiy two different cache destination for each of the platforms). This renders the --cache-to,type=registrry essentially useless for multiplatform builds.

I reverted to "inline" mode and it seems to work, but I would really love to keep the latest cache in a separate tag of the image.

potiuk avatar Apr 04 '22 11:04 potiuk

The cache of arm64 overwrites the cache of amD64, causing only one of the two platforms to be available. Using inline mode was a bit expensive for us to manage local storage, there were frequent disk space runs out, and the cache would disappear after the BuildKit container was restarted

Nick-0314 avatar Apr 29 '22 05:04 Nick-0314

@tonistiigi Can this problem be circumvented by adding the default schema suffix to cache-to-Registry? For example: "repo/ubuntu:cache-linux-arm64". Is this easy to develop? It is currently possible to define multiple cache-from, but once cache-to can be suffixed, I can cache-from multiple schemas

Nick-0314 avatar May 07 '22 07:05 Nick-0314

This is what I am planning to do - but then such multiplatform image cannot be prepared with single buildx command because you can specify only one --cache-to when you run single multi-platform build even with remote builders

Which renders buildx feature of preparing multi-platform image with remote builders in a single command pretty useless.

potiuk avatar May 07 '22 10:05 potiuk

What I actually plan to do is do it in two steps (until it is fixed):

  1. Build a single multiplatform image and push it without cache
  2. Run separate two steps to AGAIN build and push (only cache) in two separate commands for two platforms separately

This is quite an overhead though the build cacje in builders will be reused so the overhead for running 3 commands instead of one should be bearable.

potiuk avatar May 07 '22 10:05 potiuk

@potiuk There is another workaround that does not require building twice.

Node *: build Docker image on its own and push it to a standalone repo with cache. Main node: concat these images together

docker manifest create USERNAME/REPOSITORY:TAG --amend USERNAME/REPOSITORY-NODE1:TAG --amend USERNAME/REPOSITORY-NODE2:TAG --amend USERNAME/REPOSITORY-NODE*:TAG
docker manifest push USERNAME/REPOSITORY:TAG

Refer to https://github.com/knatnetwork/github-runner/blob/399a888e5c9de2a38854a07570df661d59749284/.github/workflows/build.yml#L116 if you need an actual use case.

I consider it is possible to use only one repo by just using a standalone image tag and cache tag for each node.

I consider docker manifest may also be able to operate registry-cache tags instead of just image tags, so probably there are other workarounds. If you do a try, could you please comment and let me know?

Rongronggg9 avatar May 07 '22 22:05 Rongronggg9

Yeah. That's what I wanted to avoid to manually manipulate manifests. I prefer to rely on buildx behaviour.

This way I do not have to rely or get the "Nodes" and can nicely use multi-node builder just knowing it's name (and then pushing cache can be done from any node).

Also I think it has some nice properties to separate the caches out in different tag. We have our own "development environment" called breeze which hides the complexity of where (and when) the cache is used from and it makes it easy to decide which cache to use based on platform. And it's makaes it super easy to track and diagnose user issues as they can copy&paste the verbose command they used and it's a bit easier to track the history of that particular cache. So I will stick to that.

The overhead is very little actually, because in both steps I use the same builders (ARM and AMD hardware based) and the first step just builds a single multplatform image with --push , where the two subsequent steps just run single platform cache but they are reusing the local cache already built in the first step.

potiuk avatar May 08 '22 08:05 potiuk

What I actually plan to do is do it in two steps (until it is fixed):

1. Build a single multiplatform image and push it without cache

2. Run separate two steps to AGAIN build and push (only cache) in two separate commands for two platforms separately

Trying this approach, I found that the generated manifest in step 1, is generated with one of two digests (random). The reason for this, I believe, is because it randomly orders the manifest list. This is an additional issue when trying to design a idempotent pipeline.

Attached is an example with the diff of two manifests it randomly generates for two architectures:

--- /tmp/meta-538b4.json      2022-06-20 22:39:33.302897680 -0600
+++ /tmp/meta-80e8a.json      2022-06-20 22:39:57.467873367 -0600
@@ -3,24 +3,24 @@
   "manifest": {
     "schemaVersion": 2,
     "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
-    "digest": "sha256:538b4667e072b437a5ea1e0cd97c2b35d264fd887ef686879b0a20c777940c02",
+    "digest": "sha256:80e8a68eb9363d64eabdeaceb1226ae8b1794e39dd5f06b700bae9d8b1f356d5",
     "size": 743,
     "manifests": [
       {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
-        "digest": "sha256:cef1b67558700a59f4a0e616d314e05dc8c88074c4c1076fbbfd18cc52e6607b",
+        "digest": "sha256:2bc150cfc0d4b6522738b592205d16130f2f4cde8742cd5434f7c81d8d1b2908",
         "size": 1367,
         "platform": {
-          "architecture": "arm64",
+          "architecture": "amd64",
           "os": "linux"
         }
       },
       {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
-        "digest": "sha256:2bc150cfc0d4b6522738b592205d16130f2f4cde8742cd5434f7c81d8d1b2908",
+        "digest": "sha256:cef1b67558700a59f4a0e616d314e05dc8c88074c4c1076fbbfd18cc52e6607b",
         "size": 1367,
         "platform": {
-          "architecture": "amd64",
+          "architecture": "arm64",
           "os": "linux"
         }
       }

jobcespedes avatar Jun 21 '22 12:06 jobcespedes

What I actually ended up I simply run two separate steps to push each cache separately. It turned out that I do not "really" need a combined image for development. The only difficulty is that in our automation scripts we derive the cache name from the platform we run it on (but since we have it all encapsulated in breeze development environment of ours - it was actually pretty easy:

https://github.com/apache/airflow/blob/88363b543f6f963247c332e9d7830bc782ed6e2d/dev/breeze/src/airflow_breeze/params/common_build_params.py#L104

https://github.com/apache/airflow/blob/88363b543f6f963247c332e9d7830bc782ed6e2d/dev/breeze/src/airflow_breeze/params/common_build_params.py#L139

potiuk avatar Jun 21 '22 13:06 potiuk

Currently, buildx has support for merging manifest outputs from the builder results. I think it should be possible to implement similar support for merging cache manifests, it should be very similar to the existing logic.

However, we don't have support for push-by-digest to just push content without a tag for the registry exporter, which would need to be a separate fix first on buildkit.

jedevc avatar Sep 21 '22 15:09 jedevc

Same problem here. We build at CI using a dual remote builders strategy, partial code to exemplify:

  - docker buildx create --name buildx --driver docker-container --use --platform linux/amd64 --bootstrap ssh://$AMD64_HOST
  - docker buildx create --name buildx --append --platform linux/arm64 --bootstrap ssh://$ARM64_HOST
  - docker buildx build
      --push
      --platform linux/amd64,linux/arm64
     --cache-from=registry.example.null/image-name:buildcache
     --cache-to=type=registry,mode=max,ref=registry.example.null/image-name:buildcache
     --tag registry.example.null/image-name:example-tag
    # ...

The :buildcache image will only store the cache for the last completed build. As the cache isn't for both platforms, it "rotates" between each one each time the CI builds

I will attempt to adapt the @Rongronggg9 workaround (thanks for sharing <3) and report here for reference

lorenzogrv avatar Jan 11 '23 09:01 lorenzogrv

We noticed the problem no longer persist after bumping our CI jobs to use docker:20.10.23 with docker:20.10.23-dind service.

Both cache exported and imported seem correct, build times reduced to a range similar to local usage with local cache.

lorenzogrv avatar Feb 09 '23 14:02 lorenzogrv

Hmm, bumped into this today. Seems I have to do the manifest manually.

digglife avatar Aug 10 '23 03:08 digglife

Hey! Any new about this issue?

andrey-bondar avatar Mar 11 '24 14:03 andrey-bondar