buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

buildkit fails with "mount already exists with name"

Open kit-ty-kate opened this issue 4 years ago • 10 comments

With a similar Dockerfile to https://github.com/moby/buildkit/issues/1459#issuecomment-618690727:

# syntax = docker/dockerfile:1.1.5-experimental
FROM ocurrent/opam@sha256:5bb6268b1fb7a8dfceffd92383cb4c72d0c3f995231ff916fa6aaf0b29bff4df
COPY --chown=opam . /src/
RUN git -C /src checkout -b opam-ci__cibranch a1f2c8222b2a755968b2212471fd1732bdaf6f3d && \
  git -C /src merge 49560cde56610f9fd03c0af5ef852e9b873ecf6c && \
  opam repository set-url --strict default file:///src
RUN opam pin add -k version -yn conf-nauty.1.1 1.1
RUN --mount=type=cache,target=/home/opam/.opam/download-cache,uid=1000 opam depext -uivy conf-nauty.1.1

our docker build sometimes fails with:

#1 [internal] load .dockerignore
#1 sha256:95b7a97096dc2afd2d43c2337088551dcb79f6b7cd7780992776af1afaca6f81
#1 transferring context:
#1 transferring context: 2B done
#1 DONE 0.6s

#2 [internal] load build definition from Dockerfile
#2 sha256:86eb8a668e7aabc92f8c971f945dc6e33bb4dafb625349e0e4b3e99e34f50a19
#2 transferring dockerfile: 572B done
#2 DONE 0.9s

#3 resolve image config for docker.io/docker/dockerfile:1.1.5-experimental
#3 sha256:4d904ca4dadcfe285b7e07c14c496e2212a754f0c4fd12946baa040350cae4f0
#3 DONE 0.9s

#4 docker-image://docker.io/docker/dockerfile:1.1.5-experimental@sha256:787107d7f7953cb2d95ee81cc7332d79aca9328129318e08fc7ffbd252a20656
#4 sha256:d2736ec635330516879555dba7aa00a3afb686946df7f09d957dc33a05f95b62
#4 CACHED

#5 [internal] load metadata for docker.io/ocurrent/opam@sha256:5bb6268b1fb7a8dfceffd92383cb4c72d0c3f995231ff916fa6aaf0b29bff4df
#5 sha256:c3d18c31ea0787222f2fb8d1ea33b95119a8b415e570a2d026f6aea36d6ff8b9
#5 DONE 0.0s

#11 [internal] settings cache mount permissions
#11 sha256:98e142695adfe60b7bd7795f3db2122cdc761dcae59a13eab856140b08987ce3
#11 DONE 0.0s

#6 [stage-0 1/5] FROM docker.io/ocurrent/opam@sha256:5bb6268b1fb7a8dfceffd92383cb4c72d0c3f995231ff916fa6aaf0b29bff4df
#6 sha256:16da8708e67b40ea0ed8ecd0041c80825c6c6966930c7a99e397d1a2a128b466
#6 DONE 0.0s

#7 [internal] load build context
#7 sha256:cf9ebfd4c665038fcc01c0f0427a7e3c4451e7b6dec9591b287b850ddb7a0761
#7 transferring context: 104.60MB 2.6s
#7 transferring context: 111.47MB 3.5s done
#7 DONE 4.6s

#6 [stage-0 1/5] FROM docker.io/ocurrent/opam@sha256:5bb6268b1fb7a8dfceffd92383cb4c72d0c3f995231ff916fa6aaf0b29bff4df
#6 sha256:16da8708e67b40ea0ed8ecd0041c80825c6c6966930c7a99e397d1a2a128b466
#6 CACHED

#11 [internal] settings cache mount permissions
#11 sha256:98e142695adfe60b7bd7795f3db2122cdc761dcae59a13eab856140b08987ce3
#11 CACHED

#8 [stage-0 2/5] COPY --chown=opam . /src/
#8 sha256:84fcec75053eb6d4d45b6fb8bb78931ad7518971d40c96bfcc6fe7c2b7a09cd0
#8 ERROR: mount already exists with name
failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: rpc error: code = Unknown desc = failed to build LLB: mount already exists with name

I've tried several hours to get a minimal test-case without success, however it still happens pretty often on our test server.

Here is our daemon log:

Apr 24 00:26:22 m1-d dockerd[37339]: time="2020-04-24T00:26:22.963833995Z" level=info msg="parsed scheme: \"\"" module=grpc
Apr 24 00:26:22 m1-d dockerd[37339]: time="2020-04-24T00:26:22.963920949Z" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Apr 24 00:26:22 m1-d dockerd[37339]: time="2020-04-24T00:26:22.963962093Z" level=info msg="ccResolverWrapper: sending update to cc: {[{ 0  <nil>}] <nil>}" module=grpc
Apr 24 00:26:22 m1-d dockerd[37339]: time="2020-04-24T00:26:22.963972759Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 24 00:26:30 m1-d dockerd[37339]: time="2020-04-24T00:26:30Z" level=error msg="fatal error: rpc error: code = Unknown desc = failed to build LLB: mount already exists with name"
Apr 24 00:26:30 m1-d dockerd[37339]: panic: rpc error: code = Unknown desc = failed to build LLB: mount already exists with name
Apr 24 00:26:30 m1-d dockerd[37339]: goroutine 1 [running]:
Apr 24 00:26:30 m1-d dockerd[37339]: main.main()
Apr 24 00:26:30 m1-d dockerd[37339]:         /src/frontend/dockerfile/cmd/dockerfile-frontend/main.go:26 +0x34f
Apr 24 00:26:31 m1-d dockerd[37339]: time="2020-04-24T00:26:31.282392042Z" level=warning msg="grpc: addrConn.createTransport failed to connect to { 0  <nil>}. Err :connection error: desc = \"transport: Error while dialing only one connection allowed\". Reconnecting..." module=grpc

kit-ty-kate avatar Apr 24 '20 00:04 kit-ty-kate

What version/software this is?

tonistiigi avatar May 08 '20 18:05 tonistiigi

Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b7f0
 Built:             Wed Mar 11 01:25:56 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:24:28 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

kit-ty-kate avatar May 08 '20 18:05 kit-ty-kate

We see this sporadically in our CI pipeline -- we have make kick off several parallel builds and every once in a while the job will fail with the "mount already exists with name" error:

#8 [build 2/4] WORKDIR /src/api
#8 ERROR: mount already exists with name

#9 [internal] load build context
#9 transferring context: 7.09MB 0.1s done
#9 CANCELED
------
 > [build 2/4] WORKDIR /src/api:
------
failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: rpc error: code = Unknown desc = failed to build LLB: mount already exists with name
Makefile:309: recipe for target 'docker-build-swagger' failed
make: *** [docker-build-swagger] Error 1
make: *** Waiting for unfinished jobs....

Could it be because the stages are aliased with the same name? The Dockerfile's that are built at that time both have statements like:

FROM <our-common-build-image>:latest AS build

dotcarls avatar Jun 01 '20 19:06 dotcarls

FYI we hit this on docker:20.10.2. We kick off two builds using the same docker-dind service using K8s on GitLab CI. Those builds have lots of similarities, including overlapping build stage names.

max-sixty avatar Feb 14 '21 05:02 max-sixty

We see this in CI as well when using make to parallelize multiple multistage builds that all have the same FROM imagename as builder. I'm wondering if @dotcarls might be on to something.

lawnmowerlatte avatar Feb 23 '21 17:02 lawnmowerlatte

Yeah we just hit this too:


#9 [stage-1 2/4] COPY --chown=atcloud:atcloud .npmrc package.json /usr/local/autotrader/app/
#9 sha256:be4cfa891678f62a7c246fdab682b5253f37aeb2b9b4b0c0640aa38d3fff47c2
#9 ERROR: mount already exists with name
------
 > [build 2/6] COPY --chown=atcloud:atcloud package.json .npmrc /usr/local/autotrader/app/:
------
------
 > [stage-1 2/4] COPY --chown=atcloud:atcloud .npmrc package.json /usr/local/autotrader/app/:
------
mount already exists with name
Service 'app' failed to build : Build failed
your build failed! check the logs above to understand the reason, you can recreate locally by running 'docker-compose build'

Two layers from separate stages copying files from the host, and similar to the above, both of our layers are FROM the same image.

bash-5.1# docker version
Client:
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:50:40 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:55:09 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b638
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Stono avatar Oct 08 '21 10:10 Stono

We see this in CI as well when using make to parallelize multiple multistage builds that all have the same FROM imagename as builder. I'm wondering if @dotcarls might be on to something.

This seems to be a dead end. We renamed our multistage names to be unique for each parallelized (ie. FROM image name as builder_for_app1, FROM imagename as builder_for_app2, etc.) and still saw this issue.

lawnmowerlatte avatar Nov 15 '21 20:11 lawnmowerlatte

Seeing this with Docker version 20.10.11, on a gitlab-runner host for CI pipeline.

How can we investigate to understand what's going on?

janhn avatar Nov 22 '21 14:11 janhn

We also see this problem with docker 23.01. It appears when we run 2 simultaneous docker build commands with the same FROM image and the same layers up to the first of the COPY commands, like in the example of user Stono at https://github.com/moby/buildkit/issues/1460#issuecomment-938536403

We were able to fix this by building with "--no-cache", accepting the downsides of this approach.

This leads to the following hypothesis on the root cause:

  • Assume a simple Dockerfile like:
FROM myimage:mytag
COPY a/ /a/
COPY b/ /b/
  • We run two build processes in parallel, both using the same Dockerfile, but different build contexts.
  • In our contexts both folders "a" have the same content, while folders "b" differ in content.
  • If caching is switched on, the docker engine will remember each layer for further re-use. For a COPY command, this should be determined by the destination path and the content of the source file(s) to copy (in some way, e.g. file names, timestamps, content hashes?).
  • As we start 2 build processes in parallel, both from the same FROM image, and that image was not yet used, the first build process which hits the first COPY command (on source "a") cannot reuse a cached layer. Thus it will execute the COPY and then it will create a new cached layer from it.
  • If the second build process is run after the first one, it would at the first COPY command on source "a" realize that the source content is the same, the destination is the same and re-use the cached image.
  • However, as we run the build processes simultaneously, the suspicion is that there is a race condition between the build processes. Both seem to realize that no cached layer exists, build the layer and try to add the layer to the cache.
  • If both simultaneously try to add the same layer to the cache, one wins, the other fails with the "mount already exists with name" error.
  • In this hypothesis the remedy of using "--no-cache" during build makes that the build processes do not try to record a layer to the cache and thus do not fail on the race condition.

carsten-langer avatar Mar 21 '23 09:03 carsten-langer

Looking through the code where this error could occur, the only place this occurs is here:

https://github.com/moby/moby/blob/f00569b89154a5ce8c290d5420019bb20a8ab632/layer/layer_store.go#L480-L501

This has 2 callers (for build): https://github.com/moby/moby/blob/f00569b89154a5ce8c290d5420019bb20a8ab632/builder/builder-next/adapters/snapshot/snapshot.go#L280-L285

https://github.com/moby/moby/blob/f00569b89154a5ce8c290d5420019bb20a8ab632/daemon/images/image_builder.go#L62-L63

The "name" here is always a random ID generated by crypto/rand.

cpuguy83 avatar Jun 24 '23 00:06 cpuguy83