buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

cache-from and COPY invalidates all layers instead of only the ones after COPY

Open PChambino opened this issue 4 years ago • 19 comments

Not sure if I am describing that problem correctly in the title, but this is very similar to other previous issues like https://github.com/moby/buildkit/issues/1981.

The detailed reproduction steps are here: https://github.com/carwow/buildkit-cache-issue (see .circleci/config.yml file) I think you can see the CircleCI logs here: https://app.circleci.com/pipelines/github/carwow/buildkit-cache-issue/1/workflows/73a7a1d9-5cdd-47fe-88f0-ffd3eed5dce3/jobs/2

In the last "docker build" step the "RUN echo before" should have been cached I believe, but it wasn't.

PChambino avatar May 20 '21 17:05 PChambino

Pasting some of the output of that run; so

First rebuild after pushing the cache:

#4 importing cache manifest from ******/buildkit-cache-issue:latest
#4 sha256:0b466318646b577e451877de394363e829962cf3ac26e7cbf026e5836441ff53
#4 DONE 0.3s

...

#5 [2/4] RUN echo before
#5 sha256:46bd089c781e511f41e24d8a9133aa9aa09f13d43684c47a97ef371487d5a949
#5 pulling sha256:d960726af2bec62a87ceb07182f7b94c47be03909077e23d8226658f80b47f87
#5 pulling sha256:e8d62473a22dec9ffef056b2017968a9dc484c8f836fb6d79f46980b309e9138
#5 pulling sha256:8962bc0fad55b13afedfeb6ad5cb06fd065461cf3e1ae4e7aeae5eeb17b179df
#5 pulling sha256:e8d62473a22dec9ffef056b2017968a9dc484c8f836fb6d79f46980b309e9138 0.3s done
#5 pulling sha256:8962bc0fad55b13afedfeb6ad5cb06fd065461cf3e1ae4e7aeae5eeb17b179df 0.4s done
#5 pulling sha256:65d943ee54c1fa196b54ab9a6673174c66eea04cfa1a4ac5b0328b74f066a4d9
#5 pulling sha256:532f6f7237092ebd79f21ccd3cf050138b31abeed1b29bac39cfdb30786a615b
#5 pulling sha256:d960726af2bec62a87ceb07182f7b94c47be03909077e23d8226658f80b47f87 0.9s done
#5 pulling sha256:1334e0fe2851ea7f3d2509a3907312a665d7c6b085e1f0671f6cd2dcf37b82db
#5 pulling sha256:1334e0fe2851ea7f3d2509a3907312a665d7c6b085e1f0671f6cd2dcf37b82db 0.3s done
#5 pulling sha256:ba365db42d143222de2a48c0c47039747bebe1a7858712d0320aa8a267da64ea
#5 pulling sha256:65d943ee54c1fa196b54ab9a6673174c66eea04cfa1a4ac5b0328b74f066a4d9 1.1s done
#5 pulling sha256:ba365db42d143222de2a48c0c47039747bebe1a7858712d0320aa8a267da64ea 0.4s done
#5 pulling sha256:9c5512e22a8630d57cea37ba33500c8a84b79ea18af7fe854a419725301dcb60 0.1s done
#5 pulling sha256:cb3bee3da6f673ff53851cfc66aeb923b5c39ea9a51a09e1b3d9cd23d324d78a
#5 pulling sha256:36fe7d125168b057bb6b0857885255f3dde855b97a36bd253f4fd22f33b950bd
#5 pulling sha256:cb3bee3da6f673ff53851cfc66aeb923b5c39ea9a51a09e1b3d9cd23d324d78a 0.1s done
#5 pulling sha256:36fe7d125168b057bb6b0857885255f3dde855b97a36bd253f4fd22f33b950bd 0.1s done
#5 pulling sha256:532f6f7237092ebd79f21ccd3cf050138b31abeed1b29bac39cfdb30786a615b 3.3s done
#5 CACHED

Second rebuild after pushing the cache (cache manifest didn't change: same digest):

#4 importing cache manifest from ******/buildkit-cache-issue:latest
#4 sha256:0b466318646b577e451877de394363e829962cf3ac26e7cbf026e5836441ff53
#4 DONE 0.3s

...

#5 [2/4] RUN echo before
#5 sha256:46bd089c781e511f41e24d8a9133aa9aa09f13d43684c47a97ef371487d5a949
#5 0.399 before
#5 DONE 1.4s

thaJeztah avatar May 20 '21 17:05 thaJeztah

Likely git clone is not deterministic and therefore . is different. Can't see logs.

@thaJeztah That digest is LLB digest. We should remove it from the output if it adds confusion.

tonistiigi avatar May 20 '21 17:05 tonistiigi

Thank you @thaJeztah

@tonistiigi It only git clones once in the beginning, so that should not impact each docker build step.

I do touch a file to change the directory that is copied between each docker build, but it should only invalidate "RUN echo after", not "RUN echo before", right? It behaves correctly in the first build after pushing the cache.

PChambino avatar May 20 '21 17:05 PChambino

That digest is LLB digest. We should remove it from the output if it adds confusion.

Ah! Yes, at least it confused me

thaJeztah avatar May 20 '21 17:05 thaJeztah

but it should only invalidate "RUN echo after", not "RUN echo before", right?

Try it with a command that actually creates files. That layer might have been optimized out.

tonistiigi avatar May 20 '21 21:05 tonistiigi

We have the same issue in a Dockerfile that installs python dependencies.

Dockerfile
FROM python:3.7-buster

# upgrade pip
RUN pip install -U pip

# install dbt
ENV PATH=/root/.local/bin:$PATH
RUN pip install pipx \
  && pipx install dbt-core \
  && pipx inject dbt-core dbt-snowflake \
  && rm -rf /root/.local/pipx/.cache

# install app dependencies
WORKDIR /app
COPY poetry.lock pyproject.toml ./
RUN pip install poetry \
  && poetry config virtualenvs.create false \
  && poetry install --no-interaction --no-ansi --no-root \
  && rm -rf /root/.cache

# install app
COPY . /app
RUN poetry install --no-interaction --no-ansi

# setup environment
ENV AIRFLOW_HOME=/app/airflow_home
ENV PYTHONPATH=$PYTHONPATH:/app
ENTRYPOINT ["sh", "/app/entrypoint.sh"]

The whole docker image gets rebuild every other build when it should only invalidate the layers after COPY . /app.

As a workaround, explicitly pulling the image before building it, has the correct behaviour (circleci build). I would expect it to work the same regardless of the image being pulled. Is that not the expected behaviour?

CircleCI seems to be having an outage right now, so can't test if the behaviour is the same with a RUN command that creates files in that minimal Dockerfile, but I would expect it to be. I'll share tomorrow when CircleCI is working again.

PChambino avatar May 21 '21 16:05 PChambino

Here it is: https://app.circleci.com/pipelines/github/carwow/buildkit-cache-issue/3/workflows/3b300b7e-4313-44fd-8b66-f9a6f3f1d31d/jobs/6

Same behaviour with RUN date > before. The first build after pushing the cache uses the cache for RUN date > before but the following build doesn't.

PChambino avatar May 22 '21 20:05 PChambino

I believe I see the same issue: with the following Dockerfile

# syntax=docker/dockerfile:1.2
# ====================
# STAGE 1
# ====================
FROM alpine:3.7 as stage1
RUN apk add --no-cache curl

# ====================
# STAGE 2
# ====================
FROM stage1 as stage2
COPY ./dummy ./dummy

# ====================
# STAGE 3
# ====================
FROM scratch as stage3
COPY --from=stage2 ./dummy ./dummy

once the content of dummy is changed, stage 1 is rebuilt.

Steps to reproduce:

REGISTRY=<...>
# build the cache
echo 'x' > dummy
docker buildx create --driver docker-container --name test-builder1 --driver-opt image=moby/buildkit:v0.8.3
docker buildx build --builder test-builder1 --cache-to type=inline --tag $REGISTRY/test:1 --push .
# build with cache
docker buildx create --driver docker-container --name test-builder2 --driver-opt image=moby/buildkit:v0.8.3
docker buildx build --builder test-builder2 --cache-from $REGISTRY/test:1 --tag local/test:2 --load .
# change dummy and build with cache
echo 'y' > dummy
docker buildx create --driver docker-container --name test-builder3 --driver-opt image=moby/buildkit:v0.8.3
docker buildx build --builder test-builder3 --cache-from $REGISTRY/test:1 --tag local/test:3 --load .

On the second run the output contains

 => CACHED [stage1 2/2] RUN apk add --no-cache curl                                                                       0.0s
 => CACHED [stage2 1/1] COPY ./dummy ./dummy                                                                              0.0s
 => CACHED [stage3 1/1] COPY --from=stage2 ./dummy ./dummy

but on the third one it is just

 => [stage1 2/2] RUN apk add --no-cache curl                                                                              0.7s
 => [stage2 1/1] COPY ./dummy ./dummy                                                                                     0.0s
 => [stage3 1/1] COPY --from=stage2 ./dummy ./dummy

evgeniikhandygo-apc avatar Jun 16 '21 14:06 evgeniikhandygo-apc

Any updates on this? From what I've experienced cache does not invalidate when having local cache but it will do weird things when pulling cache from S3 like invalidating cache from previous layers as the people describe I'm this issue.

I know that pulling the image might be an option but you shouldn't need to pull the image, if the layers are not changing it doesn't make sense to pull something that it's already in the bucket in this case.

guillenotfound avatar Feb 12 '23 23:02 guillenotfound

The example from @evgeniikhandygo-apc is expected. Your final stage depends on stage2 and stage2 is invalidated, therefore it needs to be rebuilt. It depends on stage2, not just one file. As stage2 depends on stage1 that needs to be rebuilt as well.

If you reorganize the file to remove the dependency you don't seem to be using

# syntax=docker/dockerfile:1.2
# ====================
# STAGE 1
# ====================
FROM alpine:3.7 as stage1
RUN apk add --no-cache curl

FROM scratch AS files
COPY ./dummy ./dummy

# ====================
# STAGE 2
# ====================
FROM stage1 as stage2
COPY --from=files . .

# ====================
# STAGE 3
# ====================
FROM scratch as stage3
COPY --from=files ./dummy ./dummy

Then everything will work as expected.

tonistiigi avatar Feb 13 '23 00:02 tonistiigi

@tonistiigi I'll try your suggestion, but why would be invalidated installing dependencies layer if the layer that is changing is the next one?

guillenotfound avatar Feb 13 '23 08:02 guillenotfound

Wow, I was tearing my hair out over this and it turns out that as @tonistiigi said it was non-deterministic behavior when adding "." in the CI environment that was the problem. I added .git to .dockerignore and the cache started working as expected.

amanfredi avatar May 12 '23 09:05 amanfredi

Wow, I was tearing my hair out over this and it turns out that as @tonistiigi said it was non-deterministic behavior when adding "." in the CI environment that was the problem. I added .git to .dockerignore and the cache started working as expected.

fwiw i think a good best practice is to use an explicit allowlist for what docker can see, with a .dockerignore that looks like:

# ignore all files by default
*

# allow what docker should see
!src/
!README.md
!pyproject.toml
# ... etc

more justification for this approach: https://youknowfordevs.com/2018/12/07/getting-control-of-your-dockerignore-files.html

jli avatar May 12 '23 15:05 jli

Hi @guillenotfound we are facing this issue now while using S3 as cache backend, were you able to solve this ?

AM1729 avatar May 31 '23 13:05 AM1729

I think I am running into the same https://github.com/moby/buildkit/discussions/5415

I also used https://github.com/MShekow/directory-checksum/ to ensure that the directory contents are identical (they are).

But the final layer always gets rebuilt and cache busted.

moltar avatar Oct 09 '24 16:10 moltar

#4910 also seems related to this.

I could not explain why rebuilding the same commit was exhibiting the expected behavior but new commits without any relevant file changes (in my case not any COPY at all but a RUN --mount) were not using the cache.

It now works as expected and makes sense with https://github.com/moby/buildkit/issues/2120#issuecomment-1545881480.

gxtaillon avatar Oct 17 '25 22:10 gxtaillon

As stage2 depends on stage1 that needs to be rebuilt as well.

@tonistiigi That doesn't make sense, stage1 should remain cached, and stage2 should use cached stage1 when rebuilding stage2

stage1 => stage2 => stage3

A cache invalidation of stage2 should not affect stage1

brycedrennan avatar Nov 21 '25 17:11 brycedrennan

@brycedrennan It isn't that stage2 invalidates cache for stage1. It is that stage2 depends on the files from stage1 and you are only exporting inline cache, so layers for only the final stage. In order for COPY to run it first needs to have a destination directory, and that destination directory is defined as "alpine+curl". If you remove the dependency that COPY ./dummy ./dummy doesn't need to have destination directory that contains curl binary then this will work.

tonistiigi avatar Nov 22 '25 01:11 tonistiigi

I see I was assuming mode=max. I am having a similar issue but I'll need to find or open another ticket.

brycedrennan avatar Nov 22 '25 02:11 brycedrennan