tilt icon indicating copy to clipboard operation
tilt copied to clipboard

ImageBuild: failed to compute cache key. when using ZFS as Storage Driver

Open sdelrio opened this issue 5 years ago • 9 comments

Environment

 Server Version: 19.03.8
 Storage Driver: zfs
 Kernel Version: 5.4.0-42-generic
 Tilt.dev version: v0.17.2 (but also happens with older versions)

It only happens when using ZFS as storage driver.

Sample

https://github.com/sdelrio/tiltdev-issue-zfs

Dockerfiles used in tilt

  • prometheus/Dockerfile
FROM prom/prometheus:v2.15.1

COPY prometheus.yml /etc/prometheus/prometheus.yml
  • grafan/Dockerfile
FROM grafana/grafana

COPY --chown=grafana:grafana ./provisioning/ /etc/grafana/provisioning/%

Logs when tilt up

  • Streamlog
Successfully loaded Tiltfile (230.373436ms)
   prometheus │ 
   prometheus │ Initial Build • prometheus
   prometheus │ STEP 1/1 — Building Dockerfile: [tilt.dev/prometheus]
   prometheus │ Building Dockerfile:
   prometheus │   FROM prom/prometheus:v2.15.1
   prometheus │   
   prometheus │   COPY prometheus.yml /etc/prometheus/prometheus.yml
   prometheus │   
   prometheus │ 
   prometheus │ 
   prometheus │      Tarring context…
   prometheus │      Building image
      grafana │ 
      grafana │ Initial Build • grafana
      grafana │ STEP 1/1 — Building Dockerfile: [tilt.dev/grafana]
      grafana │ Building Dockerfile:
      grafana │   FROM grafana/grafana
      grafana │   
      grafana │   COPY --chown=grafana:grafana ./provisioning/ /etc/grafana/provisioning/
      grafana │ 
      grafana │      Tarring context…
      grafana │      Building image
   prometheus │      copy /context /
      grafana │      copy /context /
      grafana │      copy /context / [done: 229ms]
   prometheus │      copy /context / [done: 272ms]
      grafana │      [1/2] FROM docker.io/grafana/grafana@sha256:d984df30a059aa8a4782f3cf316882c38a1a509b4ef83132658301d7a53761ee
      grafana │      [2/2] COPY --chown=grafana:grafana ./provisioning/ /etc/grafana/provisioning/
      grafana │      
      grafana │      ERROR IN: [2/2] COPY --chown=grafana:grafana ./provisioning/ /etc/grafana/provisioning/
      grafana │ 
      grafana │ got unexpected error during build/deploy: ImageBuild: failed to compute cache key: failed to mount /var/lib/docker/tmp/buildkit-mount018555909: [{Type:bind Source:/var/lib/docker/zfs/graph/vag0xqoutppmvli7u640kbpjh Options:[rbind ro]}]: no such file or directory
      grafana │ ERROR: Build Failed: ImageBuild: failed to compute cache key: failed to mount /var/lib/docker/tmp/buildkit-mount018555909: [{Type:bind Source:/var/lib/docker/zfs/graph/vag0xqoutppmvli7u640kbpjh Options:[rbind ro]}]: no such file or directory
   prometheus │      [1/2] FROM docker.io/prom/prometheus:v2.15.1@sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6
   prometheus │      [2/2] COPY prometheus.yml /etc/prometheus/prometheus.yml
   prometheus │      
   prometheus │      ERROR IN: [2/2] COPY prometheus.yml /etc/prometheus/prometheus.yml
   prometheus │ 
   prometheus │ got unexpected error during build/deploy: ImageBuild: failed to compute cache key: failed to mount /var/lib/docker/tmp/buildkit-mount052540832: [{Type:bind Source:/var/lib/docker/zfs/graph/qjkwdftokr44yhvmoh0fec30r Options:[rbind ro]}]: no such file or directory
   prometheus │ ERROR: Build Failed: ImageBuild: failed to compute cache key: failed to mount /var/lib/docker/tmp/buildkit-mount052540832: [{Type:bind Source:/var/lib/docker/zfs/graph/qjkwdftokr44yhvmoh0fec30r Options:[rbind ro]}]: no such file or directory

When DOCKER_BUILKIT is disabled

When DOCKER_BUILDKIT=0 tilt up It builds correctly

sdelrio avatar Aug 21 '20 12:08 sdelrio

Thanks for the detailed report! Are you able to reproduce the problem with a plain

DOCKER_BUILDKIT=1 docker build .

?

Strongly suspect we'll have to file a bug upstream with buildkit, though we can help with that

nicks avatar Aug 21 '20 17:08 nicks

With DOCKER_BUILDKIT=1 docker build . it builds correctly:

$ DOCKER_BUILDKIT=1 docker build .
[+] Building 6.8s (7/7) FINISHED                                                                                                       
 => [internal] load .dockerignore                                                                                                 0.2s
 => => transferring context: 2B                                                                                                   0.0s
 => [internal] load build definition from Dockerfile                                                                              0.2s
 => => transferring dockerfile: 125B                                                                                              0.0s
 => [internal] load metadata for docker.io/prom/prometheus:v2.15.1                                                                1.0s
 => [internal] load build context                                                                                                 0.1s
 => => transferring context: 1.06kB                                                                                               0.0s
 => [1/2] FROM docker.io/prom/prometheus:v2.15.1@sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6          4.4s
 => => resolve docker.io/prom/prometheus:v2.15.1@sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6          0.0s
 => => sha256:61bf337f29560d2c3bc5c73168014eba58eb14fdefa2e05e78a877eae29548cd 6.67kB / 6.67kB                                    0.0s
 => => sha256:0f8c40e1270f10d085dda8ce12b7c5b17cd808f055df5a7222f54837ca0feae0 761.06kB / 761.06kB                                0.4s
 => => sha256:20b6d726196715dd18c52155016617830152925c538dde639d346c86ac9b8277 32.98MB / 32.98MB                                  0.9s
 => => sha256:7ff47b8cce6ecb7b456c6863d42a7f10bcbd040b21d7f1a912eb1b0936e9ad46 2.82kB / 2.82kB                                    0.0s
 => => sha256:626a2a3fee8c6a9b5b866adc6cb15d54b5d901b6a084a2519bf7f905325b0711 478.86kB / 478.86kB                                0.2s
 => => sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6 1.05kB / 1.05kB                                    0.0s
 => => sha256:194a9bf2eed4bbd5b5b6bf918d96bcd0211cdceacada8b64f8e5a04ceea2cb1b 19.66MB / 19.66MB                                  0.9s
 => => extracting sha256:0f8c40e1270f10d085dda8ce12b7c5b17cd808f055df5a7222f54837ca0feae0                                         0.2s
 => => sha256:d6505204a89e7d2c968eedc6d8f7a16375db6efd66d60464b6e3a71d6a9c9e6d 601B / 601B                                        0.6s
 => => sha256:261b8515dc3c57505a45ecd95de1157b98f838984aeee48d6dafd2ad4365701f 2.65kB / 2.65kB                                    0.7s
 => => extracting sha256:626a2a3fee8c6a9b5b866adc6cb15d54b5d901b6a084a2519bf7f905325b0711                                         0.4s
 => => sha256:63034c22f1b7cb7800a1882448273bed679e39ae7a91ed86b3a9448a9b6fd9f3 3.06kB / 3.06kB                                    0.9s
 => => sha256:e942f9f95dcf7e650128d41869c6cba0d5a8ecab7ee3dacb09082593613b7817 1.27kB / 1.27kB                                    1.0s
 => => sha256:2ed6bffd5408e754e306ea9768d9b3c4899d24cd3967b418986e0e418fddc21b 4.02kB / 4.02kB                                    1.0s
 => => sha256:406fca4251a1912b88d60dba05723d243ba43b7e27142899c83c0f39df72bc4c 116.95kB / 116.95kB                                1.1s
 => => sha256:da0093e941b01c9dd263f01d8134c5dd3ec16e25f5517cc498817df7ce7e13b1 184B / 184B                                        1.2s
 => => sha256:b8e73c7932f481840640f5f6d266a36ff821024df4a5506dac48e9c1b5de9c80 686B / 686B                                        1.2s
 => => extracting sha256:20b6d726196715dd18c52155016617830152925c538dde639d346c86ac9b8277                                         0.5s
 => => extracting sha256:194a9bf2eed4bbd5b5b6bf918d96bcd0211cdceacada8b64f8e5a04ceea2cb1b                                         0.4s
 => => extracting sha256:d6505204a89e7d2c968eedc6d8f7a16375db6efd66d60464b6e3a71d6a9c9e6d                                         0.0s
 => => extracting sha256:261b8515dc3c57505a45ecd95de1157b98f838984aeee48d6dafd2ad4365701f                                         0.0s
 => => extracting sha256:63034c22f1b7cb7800a1882448273bed679e39ae7a91ed86b3a9448a9b6fd9f3                                         0.0s
 => => extracting sha256:2ed6bffd5408e754e306ea9768d9b3c4899d24cd3967b418986e0e418fddc21b                                         0.0s
 => => extracting sha256:e942f9f95dcf7e650128d41869c6cba0d5a8ecab7ee3dacb09082593613b7817                                         0.0s
 => => extracting sha256:406fca4251a1912b88d60dba05723d243ba43b7e27142899c83c0f39df72bc4c                                         0.0s
 => => extracting sha256:da0093e941b01c9dd263f01d8134c5dd3ec16e25f5517cc498817df7ce7e13b1                                         0.0s
 => => extracting sha256:b8e73c7932f481840640f5f6d266a36ff821024df4a5506dac48e9c1b5de9c80                                         0.0s
 => [2/2] COPY prometheus.yml /etc/prometheus/prometheus.yml                                                                      0.2s
 => exporting to image                                                                                                            0.8s
 => => exporting layers                                                                                                           0.7s
 => => writing image sha256:545d888a0f95ca702515aa01e257a16a24066f10e1a1e21915e3cd93f412c98c                                      0.0s

$ echo $?
0

sdelrio avatar Aug 21 '20 17:08 sdelrio

2nd run of the command DOCKER_BUILDKIT=1 docker build . uses the cache successfully.

$ DOCKER_BUILDKIT=1 docker build .
[+] Building 1.1s (7/7) FINISHED                                                                                                       
 => [internal] load .dockerignore                                                                                                 0.1s
 => => transferring context: 2B                                                                                                   0.0s
 => [internal] load build definition from Dockerfile                                                                              0.1s
 => => transferring dockerfile: 125B                                                                                              0.0s
 => [internal] load metadata for docker.io/prom/prometheus:v2.15.1                                                                0.7s
 => [internal] load build context                                                                                                 0.1s
 => => transferring context: 1.06kB                                                                                               0.0s
 => [1/2] FROM docker.io/prom/prometheus:v2.15.1@sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6          0.0s
 => CACHED [2/2] COPY prometheus.yml /etc/prometheus/prometheus.yml                                                               0.0s
 => exporting to image                                                                                                            0.1s
 => => exporting layers                                                                                                           0.0s
 => => writing image sha256:545d888a0f95ca702515aa01e257a16a24066f10e1a1e21915e3cd93f412c98c                                      0.0s

sdelrio avatar Aug 21 '20 17:08 sdelrio

haha, ok, you nerd sniped me into installing zfs and playing around with it.

I'm pretty sure this is a race condition somewhere in buildkit. I can reproduce it sometimes with DOCKER_BUILDKIT=1 docker build .. But I can reproduce it reliably in your example repo with:

cd grafana
docker system prune --all
docker pull grafana/grafana
tar -c . | DOCKER_BUILDKIT=1 docker build .

with an error like

failed to solve with frontend dockerfile.v0: failed to build LLB: failed to get deleted record sha256:555daf21f178d8a494fd107c333f182aa7fa6e79c70d9a49ea0af1c8d6937c39: not found

(the streaming tarball is a bit closer to how Tilt invokes Docker)

I suspect that what's happening here is that when Tilt invokes Buildkit, we're doing a better job parallelizing things and so just hitting this race condition more reliably.

i can file a clearer bug upstream Monday, for now the best workaround is probably to disable buildkit (as you noted above)

nicks avatar Aug 23 '20 13:08 nicks

wow, thanks, I didn't want to make you install ZFS but thanks a lot for trying it.

Very interesting your test; run it several times in a loop and I get the same error as you:

 => ERROR [1/2] FROM docker.io/grafana/grafana                                                                                                                                                                                      0.0s
 => => resolve docker.io/grafana/grafana:latest                                                                                                                                                                                     0.0s
 => CANCELED [internal] load build context                                                                                                                                                                                          0.1s
 => => transferring context:                                                                                                                                                                                                        0.0s
------
 > [1/2] FROM docker.io/grafana/grafana:
------
failed to solve with frontend dockerfile.v0: failed to build LLB: failed to get deleted record sha256:555daf21f178d8a494fd107c333f182aa7fa6e79c70d9a49ea0af1c8d6937c39: not found

Using:

for i in $(seq 1 10); do { docker system prune --all --force >/dev/null; docker pull grafana/grafana >/dev/null ; tar -c . | DOCKER_BUILDKIT=1 docker build --no-cache .; }; done

Just 1 execution doesn't show the error, also if I introduce a sleep between commands the error disapear so that race condition you told looks like the most probably root cause:

 for i in $(seq 1 10); do { docker system prune --all --force >/dev/null; sleep 1; docker pull grafana/grafana >/dev/null ; sleep 1; tar -c . | DOCKER_BUILDKIT=1 docker build --no-cache .; }; done

I don't really know anything about Go programing, but could you tell me the go file where the build is done? so I could try to add some sleep/pauses to test? or is just using the buildkit go libraries to build?

sdelrio avatar Aug 23 '20 13:08 sdelrio

I think the race condition would be between the Docker daemon and the buildkit server...so there's not really a way to put a sleep inside of Tilt's code...

nicks avatar Aug 24 '20 20:08 nicks

filed upstream as https://github.com/moby/buildkit/issues/1654

nicks avatar Aug 24 '20 21:08 nicks

With last tilt v0.17.6 with updated builkit version the issue is not fixed but now fails much less. In my example rep https://github.com/sdelrio/tiltdev-issue-zfs from the 3 containers only 1 usually fails and clicking once on rebuild works.

sdelrio avatar Oct 02 '20 21:10 sdelrio

I have error message "failed to compute cache key" on version v0.35.1

mvarchdev avatar Oct 04 '25 21:10 mvarchdev