ImageBuild: failed to compute cache key. when using ZFS as Storage Driver
Environment
Server Version: 19.03.8
Storage Driver: zfs
Kernel Version: 5.4.0-42-generic
Tilt.dev version: v0.17.2 (but also happens with older versions)
It only happens when using ZFS as storage driver.
Sample
https://github.com/sdelrio/tiltdev-issue-zfs
Dockerfiles used in tilt
- prometheus/Dockerfile
FROM prom/prometheus:v2.15.1
COPY prometheus.yml /etc/prometheus/prometheus.yml
- grafan/Dockerfile
FROM grafana/grafana
COPY --chown=grafana:grafana ./provisioning/ /etc/grafana/provisioning/%
Logs when tilt up
- Streamlog
Successfully loaded Tiltfile (230.373436ms)
prometheus │
prometheus │ Initial Build • prometheus
prometheus │ STEP 1/1 — Building Dockerfile: [tilt.dev/prometheus]
prometheus │ Building Dockerfile:
prometheus │ FROM prom/prometheus:v2.15.1
prometheus │
prometheus │ COPY prometheus.yml /etc/prometheus/prometheus.yml
prometheus │
prometheus │
prometheus │
prometheus │ Tarring context…
prometheus │ Building image
grafana │
grafana │ Initial Build • grafana
grafana │ STEP 1/1 — Building Dockerfile: [tilt.dev/grafana]
grafana │ Building Dockerfile:
grafana │ FROM grafana/grafana
grafana │
grafana │ COPY --chown=grafana:grafana ./provisioning/ /etc/grafana/provisioning/
grafana │
grafana │ Tarring context…
grafana │ Building image
prometheus │ copy /context /
grafana │ copy /context /
grafana │ copy /context / [done: 229ms]
prometheus │ copy /context / [done: 272ms]
grafana │ [1/2] FROM docker.io/grafana/grafana@sha256:d984df30a059aa8a4782f3cf316882c38a1a509b4ef83132658301d7a53761ee
grafana │ [2/2] COPY --chown=grafana:grafana ./provisioning/ /etc/grafana/provisioning/
grafana │
grafana │ ERROR IN: [2/2] COPY --chown=grafana:grafana ./provisioning/ /etc/grafana/provisioning/
grafana │
grafana │ got unexpected error during build/deploy: ImageBuild: failed to compute cache key: failed to mount /var/lib/docker/tmp/buildkit-mount018555909: [{Type:bind Source:/var/lib/docker/zfs/graph/vag0xqoutppmvli7u640kbpjh Options:[rbind ro]}]: no such file or directory
grafana │ ERROR: Build Failed: ImageBuild: failed to compute cache key: failed to mount /var/lib/docker/tmp/buildkit-mount018555909: [{Type:bind Source:/var/lib/docker/zfs/graph/vag0xqoutppmvli7u640kbpjh Options:[rbind ro]}]: no such file or directory
prometheus │ [1/2] FROM docker.io/prom/prometheus:v2.15.1@sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6
prometheus │ [2/2] COPY prometheus.yml /etc/prometheus/prometheus.yml
prometheus │
prometheus │ ERROR IN: [2/2] COPY prometheus.yml /etc/prometheus/prometheus.yml
prometheus │
prometheus │ got unexpected error during build/deploy: ImageBuild: failed to compute cache key: failed to mount /var/lib/docker/tmp/buildkit-mount052540832: [{Type:bind Source:/var/lib/docker/zfs/graph/qjkwdftokr44yhvmoh0fec30r Options:[rbind ro]}]: no such file or directory
prometheus │ ERROR: Build Failed: ImageBuild: failed to compute cache key: failed to mount /var/lib/docker/tmp/buildkit-mount052540832: [{Type:bind Source:/var/lib/docker/zfs/graph/qjkwdftokr44yhvmoh0fec30r Options:[rbind ro]}]: no such file or directory
When DOCKER_BUILKIT is disabled
When DOCKER_BUILDKIT=0 tilt up It builds correctly
Thanks for the detailed report! Are you able to reproduce the problem with a plain
DOCKER_BUILDKIT=1 docker build .
?
Strongly suspect we'll have to file a bug upstream with buildkit, though we can help with that
With DOCKER_BUILDKIT=1 docker build . it builds correctly:
$ DOCKER_BUILDKIT=1 docker build .
[+] Building 6.8s (7/7) FINISHED
=> [internal] load .dockerignore 0.2s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.2s
=> => transferring dockerfile: 125B 0.0s
=> [internal] load metadata for docker.io/prom/prometheus:v2.15.1 1.0s
=> [internal] load build context 0.1s
=> => transferring context: 1.06kB 0.0s
=> [1/2] FROM docker.io/prom/prometheus:v2.15.1@sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6 4.4s
=> => resolve docker.io/prom/prometheus:v2.15.1@sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6 0.0s
=> => sha256:61bf337f29560d2c3bc5c73168014eba58eb14fdefa2e05e78a877eae29548cd 6.67kB / 6.67kB 0.0s
=> => sha256:0f8c40e1270f10d085dda8ce12b7c5b17cd808f055df5a7222f54837ca0feae0 761.06kB / 761.06kB 0.4s
=> => sha256:20b6d726196715dd18c52155016617830152925c538dde639d346c86ac9b8277 32.98MB / 32.98MB 0.9s
=> => sha256:7ff47b8cce6ecb7b456c6863d42a7f10bcbd040b21d7f1a912eb1b0936e9ad46 2.82kB / 2.82kB 0.0s
=> => sha256:626a2a3fee8c6a9b5b866adc6cb15d54b5d901b6a084a2519bf7f905325b0711 478.86kB / 478.86kB 0.2s
=> => sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6 1.05kB / 1.05kB 0.0s
=> => sha256:194a9bf2eed4bbd5b5b6bf918d96bcd0211cdceacada8b64f8e5a04ceea2cb1b 19.66MB / 19.66MB 0.9s
=> => extracting sha256:0f8c40e1270f10d085dda8ce12b7c5b17cd808f055df5a7222f54837ca0feae0 0.2s
=> => sha256:d6505204a89e7d2c968eedc6d8f7a16375db6efd66d60464b6e3a71d6a9c9e6d 601B / 601B 0.6s
=> => sha256:261b8515dc3c57505a45ecd95de1157b98f838984aeee48d6dafd2ad4365701f 2.65kB / 2.65kB 0.7s
=> => extracting sha256:626a2a3fee8c6a9b5b866adc6cb15d54b5d901b6a084a2519bf7f905325b0711 0.4s
=> => sha256:63034c22f1b7cb7800a1882448273bed679e39ae7a91ed86b3a9448a9b6fd9f3 3.06kB / 3.06kB 0.9s
=> => sha256:e942f9f95dcf7e650128d41869c6cba0d5a8ecab7ee3dacb09082593613b7817 1.27kB / 1.27kB 1.0s
=> => sha256:2ed6bffd5408e754e306ea9768d9b3c4899d24cd3967b418986e0e418fddc21b 4.02kB / 4.02kB 1.0s
=> => sha256:406fca4251a1912b88d60dba05723d243ba43b7e27142899c83c0f39df72bc4c 116.95kB / 116.95kB 1.1s
=> => sha256:da0093e941b01c9dd263f01d8134c5dd3ec16e25f5517cc498817df7ce7e13b1 184B / 184B 1.2s
=> => sha256:b8e73c7932f481840640f5f6d266a36ff821024df4a5506dac48e9c1b5de9c80 686B / 686B 1.2s
=> => extracting sha256:20b6d726196715dd18c52155016617830152925c538dde639d346c86ac9b8277 0.5s
=> => extracting sha256:194a9bf2eed4bbd5b5b6bf918d96bcd0211cdceacada8b64f8e5a04ceea2cb1b 0.4s
=> => extracting sha256:d6505204a89e7d2c968eedc6d8f7a16375db6efd66d60464b6e3a71d6a9c9e6d 0.0s
=> => extracting sha256:261b8515dc3c57505a45ecd95de1157b98f838984aeee48d6dafd2ad4365701f 0.0s
=> => extracting sha256:63034c22f1b7cb7800a1882448273bed679e39ae7a91ed86b3a9448a9b6fd9f3 0.0s
=> => extracting sha256:2ed6bffd5408e754e306ea9768d9b3c4899d24cd3967b418986e0e418fddc21b 0.0s
=> => extracting sha256:e942f9f95dcf7e650128d41869c6cba0d5a8ecab7ee3dacb09082593613b7817 0.0s
=> => extracting sha256:406fca4251a1912b88d60dba05723d243ba43b7e27142899c83c0f39df72bc4c 0.0s
=> => extracting sha256:da0093e941b01c9dd263f01d8134c5dd3ec16e25f5517cc498817df7ce7e13b1 0.0s
=> => extracting sha256:b8e73c7932f481840640f5f6d266a36ff821024df4a5506dac48e9c1b5de9c80 0.0s
=> [2/2] COPY prometheus.yml /etc/prometheus/prometheus.yml 0.2s
=> exporting to image 0.8s
=> => exporting layers 0.7s
=> => writing image sha256:545d888a0f95ca702515aa01e257a16a24066f10e1a1e21915e3cd93f412c98c 0.0s
$ echo $?
0
2nd run of the command DOCKER_BUILDKIT=1 docker build . uses the cache successfully.
$ DOCKER_BUILDKIT=1 docker build .
[+] Building 1.1s (7/7) FINISHED
=> [internal] load .dockerignore 0.1s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 125B 0.0s
=> [internal] load metadata for docker.io/prom/prometheus:v2.15.1 0.7s
=> [internal] load build context 0.1s
=> => transferring context: 1.06kB 0.0s
=> [1/2] FROM docker.io/prom/prometheus:v2.15.1@sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6 0.0s
=> CACHED [2/2] COPY prometheus.yml /etc/prometheus/prometheus.yml 0.0s
=> exporting to image 0.1s
=> => exporting layers 0.0s
=> => writing image sha256:545d888a0f95ca702515aa01e257a16a24066f10e1a1e21915e3cd93f412c98c 0.0s
haha, ok, you nerd sniped me into installing zfs and playing around with it.
I'm pretty sure this is a race condition somewhere in buildkit. I can reproduce it sometimes with DOCKER_BUILDKIT=1 docker build .. But I can reproduce it reliably in your example repo with:
cd grafana
docker system prune --all
docker pull grafana/grafana
tar -c . | DOCKER_BUILDKIT=1 docker build .
with an error like
failed to solve with frontend dockerfile.v0: failed to build LLB: failed to get deleted record sha256:555daf21f178d8a494fd107c333f182aa7fa6e79c70d9a49ea0af1c8d6937c39: not found
(the streaming tarball is a bit closer to how Tilt invokes Docker)
I suspect that what's happening here is that when Tilt invokes Buildkit, we're doing a better job parallelizing things and so just hitting this race condition more reliably.
i can file a clearer bug upstream Monday, for now the best workaround is probably to disable buildkit (as you noted above)
wow, thanks, I didn't want to make you install ZFS but thanks a lot for trying it.
Very interesting your test; run it several times in a loop and I get the same error as you:
=> ERROR [1/2] FROM docker.io/grafana/grafana 0.0s
=> => resolve docker.io/grafana/grafana:latest 0.0s
=> CANCELED [internal] load build context 0.1s
=> => transferring context: 0.0s
------
> [1/2] FROM docker.io/grafana/grafana:
------
failed to solve with frontend dockerfile.v0: failed to build LLB: failed to get deleted record sha256:555daf21f178d8a494fd107c333f182aa7fa6e79c70d9a49ea0af1c8d6937c39: not found
Using:
for i in $(seq 1 10); do { docker system prune --all --force >/dev/null; docker pull grafana/grafana >/dev/null ; tar -c . | DOCKER_BUILDKIT=1 docker build --no-cache .; }; done
Just 1 execution doesn't show the error, also if I introduce a sleep between commands the error disapear so that race condition you told looks like the most probably root cause:
for i in $(seq 1 10); do { docker system prune --all --force >/dev/null; sleep 1; docker pull grafana/grafana >/dev/null ; sleep 1; tar -c . | DOCKER_BUILDKIT=1 docker build --no-cache .; }; done
I don't really know anything about Go programing, but could you tell me the go file where the build is done? so I could try to add some sleep/pauses to test? or is just using the buildkit go libraries to build?
I think the race condition would be between the Docker daemon and the buildkit server...so there's not really a way to put a sleep inside of Tilt's code...
filed upstream as https://github.com/moby/buildkit/issues/1654
With last tilt v0.17.6 with updated builkit version the issue is not fixed but now fails much less. In my example rep https://github.com/sdelrio/tiltdev-issue-zfs from the 3 containers only 1 usually fails and clicking once on rebuild works.
I have error message "failed to compute cache key" on version v0.35.1