cache-from and COPY invalidates all layers instead of only the ones after COPY
Not sure if I am describing that problem correctly in the title, but this is very similar to other previous issues like https://github.com/moby/buildkit/issues/1981.
The detailed reproduction steps are here: https://github.com/carwow/buildkit-cache-issue (see .circleci/config.yml file)
I think you can see the CircleCI logs here: https://app.circleci.com/pipelines/github/carwow/buildkit-cache-issue/1/workflows/73a7a1d9-5cdd-47fe-88f0-ffd3eed5dce3/jobs/2
In the last "docker build" step the "RUN echo before" should have been cached I believe, but it wasn't.
Pasting some of the output of that run; so
First rebuild after pushing the cache:
#4 importing cache manifest from ******/buildkit-cache-issue:latest
#4 sha256:0b466318646b577e451877de394363e829962cf3ac26e7cbf026e5836441ff53
#4 DONE 0.3s
...
#5 [2/4] RUN echo before
#5 sha256:46bd089c781e511f41e24d8a9133aa9aa09f13d43684c47a97ef371487d5a949
#5 pulling sha256:d960726af2bec62a87ceb07182f7b94c47be03909077e23d8226658f80b47f87
#5 pulling sha256:e8d62473a22dec9ffef056b2017968a9dc484c8f836fb6d79f46980b309e9138
#5 pulling sha256:8962bc0fad55b13afedfeb6ad5cb06fd065461cf3e1ae4e7aeae5eeb17b179df
#5 pulling sha256:e8d62473a22dec9ffef056b2017968a9dc484c8f836fb6d79f46980b309e9138 0.3s done
#5 pulling sha256:8962bc0fad55b13afedfeb6ad5cb06fd065461cf3e1ae4e7aeae5eeb17b179df 0.4s done
#5 pulling sha256:65d943ee54c1fa196b54ab9a6673174c66eea04cfa1a4ac5b0328b74f066a4d9
#5 pulling sha256:532f6f7237092ebd79f21ccd3cf050138b31abeed1b29bac39cfdb30786a615b
#5 pulling sha256:d960726af2bec62a87ceb07182f7b94c47be03909077e23d8226658f80b47f87 0.9s done
#5 pulling sha256:1334e0fe2851ea7f3d2509a3907312a665d7c6b085e1f0671f6cd2dcf37b82db
#5 pulling sha256:1334e0fe2851ea7f3d2509a3907312a665d7c6b085e1f0671f6cd2dcf37b82db 0.3s done
#5 pulling sha256:ba365db42d143222de2a48c0c47039747bebe1a7858712d0320aa8a267da64ea
#5 pulling sha256:65d943ee54c1fa196b54ab9a6673174c66eea04cfa1a4ac5b0328b74f066a4d9 1.1s done
#5 pulling sha256:ba365db42d143222de2a48c0c47039747bebe1a7858712d0320aa8a267da64ea 0.4s done
#5 pulling sha256:9c5512e22a8630d57cea37ba33500c8a84b79ea18af7fe854a419725301dcb60 0.1s done
#5 pulling sha256:cb3bee3da6f673ff53851cfc66aeb923b5c39ea9a51a09e1b3d9cd23d324d78a
#5 pulling sha256:36fe7d125168b057bb6b0857885255f3dde855b97a36bd253f4fd22f33b950bd
#5 pulling sha256:cb3bee3da6f673ff53851cfc66aeb923b5c39ea9a51a09e1b3d9cd23d324d78a 0.1s done
#5 pulling sha256:36fe7d125168b057bb6b0857885255f3dde855b97a36bd253f4fd22f33b950bd 0.1s done
#5 pulling sha256:532f6f7237092ebd79f21ccd3cf050138b31abeed1b29bac39cfdb30786a615b 3.3s done
#5 CACHED
Second rebuild after pushing the cache (cache manifest didn't change: same digest):
#4 importing cache manifest from ******/buildkit-cache-issue:latest
#4 sha256:0b466318646b577e451877de394363e829962cf3ac26e7cbf026e5836441ff53
#4 DONE 0.3s
...
#5 [2/4] RUN echo before
#5 sha256:46bd089c781e511f41e24d8a9133aa9aa09f13d43684c47a97ef371487d5a949
#5 0.399 before
#5 DONE 1.4s
Likely git clone is not deterministic and therefore . is different. Can't see logs.
@thaJeztah That digest is LLB digest. We should remove it from the output if it adds confusion.
Thank you @thaJeztah
@tonistiigi It only git clones once in the beginning, so that should not impact each docker build step.
I do touch a file to change the directory that is copied between each docker build, but it should only invalidate "RUN echo after", not "RUN echo before", right? It behaves correctly in the first build after pushing the cache.
That digest is LLB digest. We should remove it from the output if it adds confusion.
Ah! Yes, at least it confused me
but it should only invalidate "RUN echo after", not "RUN echo before", right?
Try it with a command that actually creates files. That layer might have been optimized out.
We have the same issue in a Dockerfile that installs python dependencies.
Dockerfile
FROM python:3.7-buster
# upgrade pip
RUN pip install -U pip
# install dbt
ENV PATH=/root/.local/bin:$PATH
RUN pip install pipx \
&& pipx install dbt-core \
&& pipx inject dbt-core dbt-snowflake \
&& rm -rf /root/.local/pipx/.cache
# install app dependencies
WORKDIR /app
COPY poetry.lock pyproject.toml ./
RUN pip install poetry \
&& poetry config virtualenvs.create false \
&& poetry install --no-interaction --no-ansi --no-root \
&& rm -rf /root/.cache
# install app
COPY . /app
RUN poetry install --no-interaction --no-ansi
# setup environment
ENV AIRFLOW_HOME=/app/airflow_home
ENV PYTHONPATH=$PYTHONPATH:/app
ENTRYPOINT ["sh", "/app/entrypoint.sh"]
The whole docker image gets rebuild every other build when it should only invalidate the layers after COPY . /app.
As a workaround, explicitly pulling the image before building it, has the correct behaviour (circleci build). I would expect it to work the same regardless of the image being pulled. Is that not the expected behaviour?
CircleCI seems to be having an outage right now, so can't test if the behaviour is the same with a RUN command that creates files in that minimal Dockerfile, but I would expect it to be. I'll share tomorrow when CircleCI is working again.
Here it is: https://app.circleci.com/pipelines/github/carwow/buildkit-cache-issue/3/workflows/3b300b7e-4313-44fd-8b66-f9a6f3f1d31d/jobs/6
Same behaviour with RUN date > before. The first build after pushing the cache uses the cache for RUN date > before but the following build doesn't.
I believe I see the same issue: with the following Dockerfile
# syntax=docker/dockerfile:1.2
# ====================
# STAGE 1
# ====================
FROM alpine:3.7 as stage1
RUN apk add --no-cache curl
# ====================
# STAGE 2
# ====================
FROM stage1 as stage2
COPY ./dummy ./dummy
# ====================
# STAGE 3
# ====================
FROM scratch as stage3
COPY --from=stage2 ./dummy ./dummy
once the content of dummy is changed, stage 1 is rebuilt.
Steps to reproduce:
REGISTRY=<...>
# build the cache
echo 'x' > dummy
docker buildx create --driver docker-container --name test-builder1 --driver-opt image=moby/buildkit:v0.8.3
docker buildx build --builder test-builder1 --cache-to type=inline --tag $REGISTRY/test:1 --push .
# build with cache
docker buildx create --driver docker-container --name test-builder2 --driver-opt image=moby/buildkit:v0.8.3
docker buildx build --builder test-builder2 --cache-from $REGISTRY/test:1 --tag local/test:2 --load .
# change dummy and build with cache
echo 'y' > dummy
docker buildx create --driver docker-container --name test-builder3 --driver-opt image=moby/buildkit:v0.8.3
docker buildx build --builder test-builder3 --cache-from $REGISTRY/test:1 --tag local/test:3 --load .
On the second run the output contains
=> CACHED [stage1 2/2] RUN apk add --no-cache curl 0.0s
=> CACHED [stage2 1/1] COPY ./dummy ./dummy 0.0s
=> CACHED [stage3 1/1] COPY --from=stage2 ./dummy ./dummy
but on the third one it is just
=> [stage1 2/2] RUN apk add --no-cache curl 0.7s
=> [stage2 1/1] COPY ./dummy ./dummy 0.0s
=> [stage3 1/1] COPY --from=stage2 ./dummy ./dummy
Any updates on this? From what I've experienced cache does not invalidate when having local cache but it will do weird things when pulling cache from S3 like invalidating cache from previous layers as the people describe I'm this issue.
I know that pulling the image might be an option but you shouldn't need to pull the image, if the layers are not changing it doesn't make sense to pull something that it's already in the bucket in this case.
The example from @evgeniikhandygo-apc is expected. Your final stage depends on stage2 and stage2 is invalidated, therefore it needs to be rebuilt. It depends on stage2, not just one file. As stage2 depends on stage1 that needs to be rebuilt as well.
If you reorganize the file to remove the dependency you don't seem to be using
# syntax=docker/dockerfile:1.2
# ====================
# STAGE 1
# ====================
FROM alpine:3.7 as stage1
RUN apk add --no-cache curl
FROM scratch AS files
COPY ./dummy ./dummy
# ====================
# STAGE 2
# ====================
FROM stage1 as stage2
COPY --from=files . .
# ====================
# STAGE 3
# ====================
FROM scratch as stage3
COPY --from=files ./dummy ./dummy
Then everything will work as expected.
@tonistiigi I'll try your suggestion, but why would be invalidated installing dependencies layer if the layer that is changing is the next one?
Wow, I was tearing my hair out over this and it turns out that as @tonistiigi said it was non-deterministic behavior when adding "." in the CI environment that was the problem. I added .git to .dockerignore and the cache started working as expected.
Wow, I was tearing my hair out over this and it turns out that as @tonistiigi said it was non-deterministic behavior when adding "." in the CI environment that was the problem. I added .git to .dockerignore and the cache started working as expected.
fwiw i think a good best practice is to use an explicit allowlist for what docker can see, with a .dockerignore that looks like:
# ignore all files by default
*
# allow what docker should see
!src/
!README.md
!pyproject.toml
# ... etc
more justification for this approach: https://youknowfordevs.com/2018/12/07/getting-control-of-your-dockerignore-files.html
Hi @guillenotfound we are facing this issue now while using S3 as cache backend, were you able to solve this ?
I think I am running into the same https://github.com/moby/buildkit/discussions/5415
I also used https://github.com/MShekow/directory-checksum/ to ensure that the directory contents are identical (they are).
But the final layer always gets rebuilt and cache busted.
#4910 also seems related to this.
I could not explain why rebuilding the same commit was exhibiting the expected behavior but new commits without any relevant file changes (in my case not any COPY at all but a RUN --mount) were not using the cache.
It now works as expected and makes sense with https://github.com/moby/buildkit/issues/2120#issuecomment-1545881480.
As
stage2depends onstage1that needs to be rebuilt as well.
@tonistiigi That doesn't make sense, stage1 should remain cached, and stage2 should use cached stage1 when rebuilding stage2
stage1 => stage2 => stage3
A cache invalidation of stage2 should not affect stage1
@brycedrennan It isn't that stage2 invalidates cache for stage1. It is that stage2 depends on the files from stage1 and you are only exporting inline cache, so layers for only the final stage. In order for COPY to run it first needs to have a destination directory, and that destination directory is defined as "alpine+curl". If you remove the dependency that COPY ./dummy ./dummy doesn't need to have destination directory that contains curl binary then this will work.
I see I was assuming mode=max. I am having a similar issue but I'll need to find or open another ticket.