buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

Buildkit cache ignoring file contents when comparing to cache

Open bglimepoint opened this issue 3 years ago • 11 comments

Hello :-)

We've been using the buildctl to build images and have found that the cache is reporting a hit for two files with the same name, modified date, and size; but different contents.

A script to reproduce via buildctl:

( echo 'FROM alpine'; echo 'COPY /test /'; echo 'RUN cat /test' ) > Dockerfile

touch timestamp

echo XXX > test
touch -r timestamp test

buildctl --addr tcp://... build --progress plain --frontend=dockerfile.v0 --local context=. --local dockerfile=.

echo YYY > test
touch -r timestamp test

buildctl --addr tcp://... build --progress plain --frontend=dockerfile.v0 --local context=. --local dockerfile=.

The first buildctl outputs:

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 76B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/alpine:latest
#3 ...

#4 [auth] library/alpine:pull token for registry-1.docker.io
#4 DONE 0.0s

#3 [internal] load metadata for docker.io/library/alpine:latest
#3 DONE 2.4s

#5 [1/3] FROM docker.io/library/alpine@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300
#5 resolve docker.io/library/alpine@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300 0.1s done
#5 CACHED

#6 [internal] load build context
#6 transferring context: 35B done
#6 DONE 0.0s

#7 [2/3] COPY /test /
#7 DONE 0.1s

#8 [3/3] RUN cat /test
#8 0.157 XXX
#8 DONE 0.2s

Whilst the second invocation hits the cache, even though file test has been modified:

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 76B done
#1 DONE 0.1s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.1s

#3 [internal] load metadata for docker.io/library/alpine:latest
#3 DONE 0.3s

#4 [1/3] FROM docker.io/library/alpine@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300
#4 resolve docker.io/library/alpine@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300 0.0s done
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 25B done
#5 DONE 0.0s

#6 [2/3] COPY /test /
#6 CACHED

#7 [3/3] RUN cat /test
#7 CACHED

The same caching occurs when using buildkit via docker build, a script to demonstrate:

export DOCKER_BUILDKIT=1
( echo 'FROM alpine'; echo 'COPY /test /'; echo 'RUN cat /test' ) > Dockerfile

touch timestamp

echo XXX > test
touch -r timestamp test

docker build --progress plain .

echo YYY > test
touch -r timestamp test

docker build --progress plain .

And the second command output shows:

...
#5 [internal] load build context
#5 transferring context: 25B done
#5 DONE 0.0s

#6 [2/3] COPY /test /
#6 CACHED

#7 [3/3] RUN cat /test
#7 CACHED

However, when using non-buildkit Docker build the output is correct:

Sending build context to Docker daemon  3.584kB
Step 1/3 : FROM alpine
 ---> a24bb4013296
Step 2/3 : COPY /test /
 ---> f48bf3a0ef13
Step 3/3 : RUN cat /test
 ---> Running in 9db3e9cac159
XXX
Removing intermediate container 9db3e9cac159
 ---> 1aff2dd86b42
Successfully built 1aff2dd86b42
Sending build context to Docker daemon  3.584kB
Step 1/3 : FROM alpine
 ---> a24bb4013296
Step 2/3 : COPY /test /
 ---> 9b36a97fcdd8
Step 3/3 : RUN cat /test
 ---> Running in 3a281bc7d637
YYY
Removing intermediate container 3a281bc7d637
 ---> 07b77738e34d
Successfully built 07b77738e34d

It seems that buildkit isn't actually checking the contents of the file, and is just assuming that the same name, size, and modified date means the files are identical - even though they aren't.

Thanks for your time - please let us know if you would like any more details or any such.

bglimepoint avatar Feb 02 '22 04:02 bglimepoint

This isn't (likely) the actual build caching in here but the context transfer. If you have transferred build context for a project once then on second-time buildkit only transfers the changes. This follows the similar semantics as rsync defaults so is indeed metadata-based to understand what files have been updated(except for certain transfers like the dockerfile itself that we send always to avoid this same case). If you reset the timestamp then file is not transferred over and the content check runs using the previous file.

tonistiigi avatar Feb 02 '22 07:02 tonistiigi

Thanks for the response @tonistiigi. Shouldn't the context transfer also look at the file contents? (e.g. via some kind of hashing - akin to rsync's --checksum argument, or an ETag.) Or perhaps an option akin to --ignore-times, with the delta transferring algorithm?

Relying on timestamps, file sizes, and file names alone feels like a bug - especially when it is different to how Docker handles this?

bglimepoint avatar Feb 02 '22 07:02 bglimepoint

It's something we feel is a justified compromise and validated by other similar industry-standard tools. It is possible we could add options in the future if user wants a different behavior with the cost of slower builds. There's no specific proposal for it atm. Based on some experiments, on macos we could possibly do better validation without performance loss as macos tracks all changes to files (generation count) even if previous metadata is restored. I've not seen anything like this in other platforms though.

As I mentioned before, on frontends we do give frontend an option to disable diff completely on some sources if they can predict that it is a one that doesn't benefit from it. Theoretically, this could be configured with a build-arg in Dockerfile frontend but looks a bit hacky solution.

Note that this only affects the case where you are doing repeated incremental builds from the same local source. It shouldn't be common that something resets timestamps in that workflow. On previous versions there could be a case where multiple projects that were created on the same second could have a similar collision. We fixed this in buildx-v0.7 https://github.com/docker/buildx/releases/tag/v0.7.0

Ensure different projects use separate destination directories for incremental context transfer for better performance

but I'm not quite sure what the behavior is for buildctl for that case.

tonistiigi avatar Feb 02 '22 18:02 tonistiigi

Hi @tonistiigi,

Thanks for the info, we totally understand the difficult balance to find here.

Is there something that defines a "project" in this context?

The example above shows all the builds occurring in the same directory, however in reality for us they're actually happening in different locations - but within each directory the file names are the same. And hence for us they are like different "projects" - is there a way we could tell buildctl that they are unrelated builds?

Would it be possible to consider using the absolute file path as part of the metadata, rather than the relative path? Perhaps behind an option flag, or with a feature akin to ccache's base_dir? (ccache uses absolute file path in it's metadata which affect's the cache, but can be nullified if intentionally building from multiple locations but intending for the cache to be shareable - https://ccache.dev/manual/4.0.html#_compiling_in_different_directories.)

As I mentioned before, on frontends we do give frontend an option to disable diff completely on some sources if they can predict that it is a one that doesn't benefit from it. Theoretically, this could be configured with a build-arg in Dockerfile frontend but looks a bit hacky solution.

Would you happen to have a reference to some documentation, or an existing issue, that talks about this more? We'd be happy to disable this build context caching if that were possible - but aren't keen to disable the core build caching - and are just trying to understand whether this could help here. For context, our build contexts are very small, but the build steps that they're caching take quite a long time - so the build context caching isn't the cache that we would like to leverage, only the build caching.

Thanks for your time and info :-)

(We are looking into whether we can do anything to avoid the timestamp collision as well.)

bglimepoint avatar Feb 02 '22 20:02 bglimepoint

Is there something that defines a "project" in this context?

https://github.com/docker/buildx/pull/817/files#diff-7fdd319c657fbde54b83a5dfa233c5c2c417b291739ff75c05dc5d83ee7a9d24R516-R521 It's a combination of basename of the directory on client-side and unique machine identifier. I'm not 100% if the behavior of buildctl is same atm.

or an existing issue, that talks about this more?

https://github.com/moby/buildkit/pull/2081 is where it was added. There isn't a build-arg atm to do this switch for context. A line like https://github.com/moby/buildkit/pull/2081/files#diff-c876ea8bd639239ba60cde8cc40c2c2dd761f361564d4465b139b7b628e815bbR306 would need to exist for the main context. Atm it is only for Dockerfile context.

tonistiigi avatar Feb 02 '22 23:02 tonistiigi

Hello

I recently got hit by this issue too. An option to influence/disable caching of the build context would be highly desirable. Could you please consider adding this?

Kind regards, Marcel

m-schutte-ohra-nl avatar Oct 10 '23 06:10 m-schutte-ohra-nl

We could define a custom build-arg (or environmental variable) that would allow setting "project" (as described in https://github.com/moby/buildkit/issues/2592#issuecomment-1028466938 ). If a different value is set for a build, then all files are transferred to a new location in the daemon, and no incremental transfer against previously sent files is attempted.

Another option would be to add a similar arg but make it disable the incremental transfer for context and always do full transfer. But I think the first option is preferable.

tonistiigi avatar Oct 12 '23 17:10 tonistiigi

I would prefer the second option (arg disabling incremental transfer altogether) because that would cover a broader range of use cases.

m-schutte-ohra-nl avatar Oct 16 '23 07:10 m-schutte-ohra-nl

Hello!

I was recently affected by this as well in a CI pipeline. Basically, we use Nix to build some files, and Nix produces them with a fixed modification date of Jan 1 1970 with the intent of having better reproducibility.

If such files are then used to build images and the build agent is reused for different builds in the pipeline, it will ignore the new files (provided they're also the same size, which happened in our case).

In our case we found it best to disable local caching and always rely on a networked cache. It isn't ideal, but works. It would be best for us too if there was a way to disable this globally, e.g. with an environment variable.

Anyway, just wanted to add another data point to the conversation! Thanks for your work!

MarceloZabini avatar May 17 '24 17:05 MarceloZabini

Hello, I would like to mention another case where we met such behaviour.

We have a maven project that builds several modules, then it copies each built module with it's other resources to a separate folder from which they will be built. In our case some of the static resources had same size but different contents, and when they were copied on the same second it would mix them up as described by OP.

May I suggest the sort of low-cost solution of including milliseconds in the timestamp to reduce the option of such collision?

GoodOwl avatar Oct 17 '24 12:10 GoodOwl

May I suggest the sort of low-cost solution of including milliseconds in the timestamp to reduce the option of such collision?

I think we do, but the precision of file timestamps depends on your filesystem. Are you seeing the files having nanosecond differences on disk and changes not being detected?

edit:

In Docker Desktop mac:

 # cat Dockerfile
from scratch
copy in.txt out.txt

 # echo foo1 > in.txt
 # python3 touch_with_nanos.py in.txt touch "2024-10-17 15:00:00.1234"
File 'in.txt' touched with timestamp: 2024-10-17 15:00:00.1234
 # python3 -c "import os; print(os.stat('in.txt').st_mtime_ns)"
1729202400123400000
 # docker buildx build -o . . 2>/dev/null
 # cat out.txt
foo1


 # echo foo2 > in.txt
 # python3 touch_with_nanos.py in.txt touch "2024-10-17 15:00:00.1234"
File 'in.txt' touched with timestamp: 2024-10-17 15:00:00.1234
 # python3 -c "import os; print(os.stat('in.txt').st_mtime_ns)"
1729202400123400000
 # docker buildx build -o . . 2>/dev/null
 # cat out.txt
foo1



 # echo foo3 > in.txt
 # python3 touch_with_nanos.py in.txt touch "2024-10-17 15:00:00.2345"
File 'in.txt' touched with timestamp: 2024-10-17 15:00:00.2345
 # python3 -c "import os; print(os.stat('in.txt').st_mtime_ns)"
1729202400234500000
 # docker buildx build -o . . 2>/dev/null
 # cat out.txt
foo3

tonistiigi avatar Oct 18 '24 04:10 tonistiigi

Thanks for the quick reply. In my case the builds are running on Linux. When I stat the files I see the timestamp with nano seconds precision.

Interesting that when I run date +%s%N I get the same response as for date +%s, so thank you for the hint - I think that in my case the precision was reduced to seconds (at least for some flows) and it perfectly explains the problem.

GoodOwl avatar Oct 25 '24 12:10 GoodOwl