buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

Could the default mount cache id include target architecture?

Open couling opened this issue 3 years ago • 5 comments
trafficstars

Problem

There's a "gotya" when working with cache mounts and building with multiple architectures. The stated use case for these is:

This mount type allows the build container to cache directories for compilers and package managers.

However both compilers and package managers are often architecture dependant. The default id for the cache is just target so the cache is, by default, shared between architectures. This can be damaging.

At best, the cache gets flushed and is useless every build with a different architecture.

At worst, the code using the cache can't detect the incorrect architecture and gets confused by the content.

At worst worst, the code using the cache can't detect the incorrect architecture and builds a corrupt image as a result.

It's fine to expect programs using the cache to detect the cache is stale. But it's extremely uncommon for such programs to detect the wrong architecture's cache has been swapped in.

Example Error

If I have a dockerfile:

FROM alpine:latest AS base
RUN --mount=type=cache,sharing=locked,target=/var/cache/apk \
    apk add python3 py3-pip py3-wheel

And then I build twice (with qemu installed):

docker build -t my_image:latest_arm64 --platform linux/arm64 .
docker build -t my_image:latest_x86_64 --platform linux/x86_64 .

I'll end up with errors caused by the cache:

WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.15/main: UNTRUSTED signature
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.15/community: UNTRUSTED signature
ERROR: unable to select packages:
py3-pip (no such package):
   required by: world[py3-pip]
py3-wheel (no such package):
   required by: world[py3-wheel]
python3 (no such package):
   required by: world[python3]
ERROR: executor failed running [/bin/sh -c apk add python3 py3-pip py3-wheel]: exit code: 3

Workaround

As a workaround you can include ${TARGETARCH} in the id. For example:

FROM alpine:latest AS base
ARG TARGETARCH
RUN --mount=type=cache,sharing=locked,id=${TARGETARCH}/var/cache/apk,target=/var/cache/apk \
    apk add python3 py3-pip py3-wheel

Since TARGETARCH is set by default The workaround only needs a change to the dockerfile and the build commands will then work.

Desired enhancement

Ideally this "workaround" should be the default behaviour: include the value from TARGETARCH in the default id. If developers want to share a cache between multiple architectures, the current behaviour would still be available by setting an id manually. But it means that by default the cache would "just work".

The worst case of not knowing about this behaviour would be slower builds and increased network usage on multi-arch builds for platform independent caches (java, javascript...).

couling avatar Feb 04 '22 15:02 couling

I think the behavior depends on the use case. In a lot of cases same cache is desired for all platforms, eg. when downloading package source code that does not contain binaries it is usually identical. Also for general build cache other languages as go just understand that the cases when cache is specific to platform. If your case does not then I think ability to separate it via id is a good approach.

Regarding apk I don't think this is the way how you would do it and none of your packages are cached with this method.

I would do:

FROM alpine:latest AS base
RUN --mount=type=cache,sharing=locked,target=/etc/apk/cache \
   ls -l /etc/apk/cache && apk add --no-cache python3 py3-pip py3-wheel && ls -l /etc/apk/cache

That actually caches the packages that have been installed before and doesn't seem to have any requirements for TARGETARCH in id either. ls is just for debug so you see what is in cache before and after.

tonistiigi avatar Feb 04 '22 20:02 tonistiigi

As I say, this really about the safety of the defaults. I realise there's two use cases:

  • platform independent code - worst case poorer caching
  • platform dependant code - worst case failed builds or corrupted images

My reason for raising this request is that on balance I prefer default safety over default performance.

Alternatively a note about this in the documentation wouldn't go amiss. It took me an unfortunate amount of time to figure out what was going wrong.

Ultimately it's your call so I won't labor the point.


Regarding apk I don't think this is the way how you would do it and none of your packages are cached with this method.

The example I give is an SSCCE of what can go wrong. It's not a suggested way to cache PIP packages. It caches the package index and saves some performance loss from --no-cache. Its use case is a little bit lost in the given example.

couling avatar Feb 04 '22 21:02 couling

platform dependant code - worst case failed builds or corrupted images

A failed build isn't necessarily a worst-case in the dev phase but a hint for user that they forgot to set id. Not understanding that your build is inefficient although you think you did everything correctly might hurt more in a long run. In a lot of cases TARGETARCH is even completely wrong, eg. all our internal Dockerfiles are cross-compiling where separating cache by target doesn't make any sense.

Alternatively a note about this in the documentation wouldn't go amiss.

PR welcome.

It caches the package index and saves some performance loss from --no-cache

Iiuc it caches only the index, meaning if you change the command it will still download all packages again but they will always be the old versions. And I guess if the index gets old it will just fail to download packages because they don't exist anymore? A more useful pattern is to cache the packages (it's bit confusing that I use --no-cache but it still does that) so if command changes you always get the latest packages but the packages that were already downloaded once are not downloaded again.

tonistiigi avatar Feb 05 '22 02:02 tonistiigi

Just as a side-note to this, I had a similar problem and tried to resolve it with id=apk-${TARGETPLATFORM} but it didn't look like it was being expanded - would allowing arg usage here help with similar issues?

ciaranmcnulty avatar May 02 '22 20:05 ciaranmcnulty

@ciaranmcnulty, variables are already expanded in the id=. One need to opt-in with ARG instruction and use BuildKit builder. Related discussion and example → https://github.com/docker/buildx/issues/549#issuecomment-1788297892

realdimas avatar Mar 21 '25 23:03 realdimas