Docker buildkit fails to build .net project in contrary to successful builds on depracted builder
Description
I found that when you have more than one dotnet publish in dockerfile, the docker build with buildkit fails with some strange, nondescriptive error: ERROR: failed to solve: failed to prepare 6boxvrjdjur378egamsa297vp as lnddt61dq57lwjio5fkmhme9e: invalid argument. When there is only one dotnet publish line, it works with buildkit.
This failing behavior does not exists when i turn off buildkit via DOCKER_BUILDKIT=0 docker build. It successfully builds image, despite having more than one dotnet publish command.
I attached minimal repro repository, and repro steps. Hope we can clarify of what is going on here.
Reproduce
- Download minimal repro repository https://github.com/incloudss/testbuildkit
- Go to testbuildkit directory and run
docker build . - The build fails.
- Run once again build, now with
DOCKER_BUILDKIT=0 docker build .. - Build succeeds.
Expected behavior
No response
docker version
Client: Docker Engine - Community
Version: 24.0.5
API version: 1.43
Go version: go1.20.6
Git commit: ced0996
Built: Fri Jul 21 20:35:35 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.5
API version: 1.43 (minimum version 1.12)
Go version: go1.20.6
Git commit: a61e2b4
Built: Fri Jul 21 20:35:35 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.22
GitCommit: 8165feabfdfe38c65b599c4993d227328c231fca
runc:
Version: 1.1.8
GitCommit: v1.1.8-0-g82f18fe
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client: Docker Engine - Community
Version: 24.0.5
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.20.2
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 24.0.5
Storage Driver: vfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 8165feabfdfe38c65b599c4993d227328c231fca
runc version: v1.1.8-0-g82f18fe
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 5.4.225-200.el7.x86_64
Operating System: Debian GNU/Linux 12 (bookworm) (containerized)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 31.36GiB
Name: connect-build-agent-karas-5d84b54566-hkz8d
ID: d76c3467-6972-423b-8208-7a5f12201c2b
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional Info
No response
Thanks for reporting; as this is related to build / buildkit, and the client-side implementation moved to buildx, I'll move this ticket to the buildx repository.
https://github.com/incloudss/testbuildkit
FROM docker.repo.ihsmarkit.com/dotnet/sdk:6.0 AS build-env
WORKDIR /src
COPY [".nuget/NuGet.Config", "./"]
COPY . .
RUN ls /src
FROM build-env as publish
RUN ls /src
RUN dotnet publish "TestDockerBuildkit/TestDockerBuildkit.csproj" -c Release -o /publish/TestDockerBuildkit
RUN dotnet publish "TestDockerBuildkit2/TestDockerBuildkit2.csproj" -c Release -o /publish/TestDockerBuildkit2
Don't have access to docker.repo.ihsmarkit.com/dotnet/sdk:6.0 image to repro on our side. Can you change it to a public one?
@crazy-max dockerfile fixed.
@crazy-max do you need more information? if no, please remove the tag.
I just attempted the build locally and didn't receive a build error.
Are you still experiencing the issue? If you are, can you try the build with the latest stable buildkit? You can do this by running the following:
$ docker buildx create --name=sandbox --driver=docker-container --bootstrap
$ BUILDX_BUILDER=sandbox docker buildx build .
I have the same error but only in CI where I'm using DinD. It never throws when I build locally.
I don't have multiple publish commands but I the error occurs if I have multiple run statements that call dotnet.
I'm using the latest version of docker (24.0.6)
Docker throws an error with this file
FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build-env
WORKDIR /source
# Copy everything
COPY src/Project .
# Restore as distinct layers
RUN dotnet restore
# Build and publish a release
RUN dotnet publish ./Project.csproj -c Release --no-restore -o /app
# Build runtime image
FROM mcr.microsoft.com/dotnet/aspnet:7.0
WORKDIR /app
COPY --from=build-env /app .
EXPOSE 80
ENTRYPOINT ["./Project"]
But this file works without any issues
FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build-env
WORKDIR /source
# Copy everything
COPY src/Project .
# Build and publish a release
RUN dotnet publish ./Project.csproj -c Release -o /app
# Build runtime image
FROM mcr.microsoft.com/dotnet/aspnet:7.0
WORKDIR /app
COPY --from=build-env /app .
EXPOSE 80
ENTRYPOINT ["./Project"]
Do we know what could be the cause of this problem? We are also facing a similar issue with dind images with buildkit enabled.
@crazy-max dockerfile fixed.
I tried again on my side with the new Dockerfile and can't repro like @jsternberg:
$ docker build .
...
#6 [build-env 2/5] WORKDIR /src
#6 DONE 0.8s
#7 [build-env 3/5] COPY [.nuget/NuGet.Config, ./]
#7 DONE 0.1s
#8 [build-env 4/5] COPY . .
#8 DONE 0.1s
#9 [build-env 5/5] RUN ls /src
#9 0.380 Dockerfile
#9 0.380 NuGet.Config
#9 0.380 TestDockerBuildkit
#9 0.380 TestDockerBuildkit.sln
#9 0.380 TestDockerBuildkit2
#9 DONE 0.4s
#10 [publish 1/3] RUN ls /src
#10 0.518 Dockerfile
#10 0.518 NuGet.Config
#10 0.518 TestDockerBuildkit
#10 0.518 TestDockerBuildkit.sln
#10 0.518 TestDockerBuildkit2
#10 DONE 0.5s
#11 [publish 2/3] RUN dotnet publish "TestDockerBuildkit/TestDockerBuildkit.csproj" -c Release -o /publish/TestDockerBuildkit
#11 0.781 MSBuild version 17.3.2+561848881 for .NET
#11 1.066 Determining projects to restore...
#11 1.401 /src/TestDockerBuildkit/TestDockerBuildkit.csproj : warning NU1803: You are running the 'restore' operation with an 'HTTP' source, 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/'. Non-HTTPS access will be removed in a future version. Consider migrating to an 'HTTPS' source.
#11 1.465 Retrying 'FindPackagesByIdAsyncCore' for source 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/FindPackagesById()?id='Microsoft.Extensions.DependencyInjection'&semVerLevel=2.0.0'.
#11 1.465 Name or service not known (gda-packages.ihs.internal.corp:80)
#11 1.465 Name or service not known
#11 1.885 Retrying 'FindPackagesByIdAsyncCore' for source 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/FindPackagesById()?id='Microsoft.Extensions.DependencyInjection.Abstractions'&semVerLevel=2.0.0'.
#11 1.885 Name or service not known (gda-packages.ihs.internal.corp:80)
#11 1.885 Name or service not known
#11 2.168 Restored /src/TestDockerBuildkit/TestDockerBuildkit.csproj (in 780 ms).
#11 2.265 /src/TestDockerBuildkit/TestDockerBuildkit.csproj : warning NU1803: You are running the 'restore' operation with an 'HTTP' source, 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/'. Non-HTTPS access will be removed in a future version. Consider migrating to an 'HTTPS' source.
#11 3.332 TestDockerBuildkit -> /src/TestDockerBuildkit/bin/Release/net6.0/TestDockerBuildkit.dll
#11 3.349 TestDockerBuildkit -> /publish/TestDockerBuildkit/
#11 DONE 3.4s
#12 [publish 3/3] RUN dotnet publish "TestDockerBuildkit2/TestDockerBuildkit2.csproj" -c Release -o /publish/TestDockerBuildkit2
#12 0.635 MSBuild version 17.3.2+561848881 for .NET
#12 0.972 Determining projects to restore...
#12 1.219 /src/TestDockerBuildkit2/TestDockerBuildkit2.csproj : warning NU1803: You are running the 'restore' operation with an 'HTTP' source, 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/'. Non-HTTPS access will be removed in a future version. Consider migrating to an 'HTTPS' source.
#12 1.376 Restored /src/TestDockerBuildkit2/TestDockerBuildkit2.csproj (in 168 ms).
#12 1.500 /src/TestDockerBuildkit2/TestDockerBuildkit2.csproj : warning NU1803: You are running the 'restore' operation with an 'HTTP' source, 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/'. Non-HTTPS access will be removed in a future version. Consider migrating to an 'HTTPS' source.
#12 2.570 TestDockerBuildkit2 -> /src/TestDockerBuildkit2/bin/Release/net6.0/TestDockerBuildkit2.dll
#12 2.588 TestDockerBuildkit2 -> /publish/TestDockerBuildkit2/
#12 DONE 2.6s
#13 exporting to image
#13 exporting layers
#13 exporting layers 0.2s done
#13 writing image sha256:4836dca5168c79dfece4c98fee5d7912f7a44d8a892e7824cc6f7731f6cc9734 done
#13 DONE 0.2s
$ docker info
Client: Docker Engine - Community
Version: 24.0.6
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2-desktop.5
Path: /usr/local/lib/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.22.0-desktop.2
Path: /usr/local/lib/docker/cli-plugins/docker-compose
dev: Docker Dev Environments (Docker Inc.)
Version: v0.1.0
Path: /usr/local/lib/docker/cli-plugins/docker-dev
extension: Manages Docker extensions (Docker Inc.)
Version: v0.2.20
Path: /usr/local/lib/docker/cli-plugins/docker-extension
init: Creates Docker-related starter files for your project (Docker Inc.)
Version: v0.1.0-beta.8
Path: /usr/local/lib/docker/cli-plugins/docker-init
sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
Version: 0.6.0
Path: /usr/local/lib/docker/cli-plugins/docker-sbom
scan: Docker Scan (Docker Inc.)
Version: v0.26.0
Path: /usr/local/lib/docker/cli-plugins/docker-scan
scout: Docker Scout (Docker Inc.)
Version: v1.0.7
Path: /usr/local/lib/docker/cli-plugins/docker-scout
Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 21
Server Version: 24.0.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 8165feabfdfe38c65b599c4993d227328c231fca
runc version: v1.1.8-0-g82f18fe
init version: de40ad0
Security Options:
seccomp
Profile: unconfined
Kernel Version: 5.15.90.1-microsoft-standard-WSL2
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 31.31GiB
Name: docker-desktop
ID: e9fa4e2f-dfe9-470c-b0dd-310710788fd2
Docker Root Dir: /var/lib/docker
Debug Mode: false
HTTP Proxy: http.docker.internal:3128
HTTPS Proxy: http.docker.internal:3128
No Proxy: hubproxy.docker.internal
Experimental: false
Insecure Registries:
hubproxy.docker.internal:5555
127.0.0.0/8
Live Restore Enabled: false
@grimurd @krishnadas-menon If you can give a similar repro like https://github.com/incloudss/testbuildkit and output of docker info as well, that would help. Thanks.
I wonder if Storage Driver: vfs is related to your issue. Ring a bell @thaJeztah?
@crazy-max I was able to fix it by creating a new buildx context which updated the driver which was in use. Thanks for replying.
I wonder if Storage Driver: vfs is related to your issue.
I would definitely NOT recommend using the vfs storage driver for any other purpose than debugging. The vfs storage driver creates a full copy of the image for every layer that's created, and for every container that's run (which is both very slow, and may cause a lot of disk space to be used).
There have been some cases where vfs may not be able to write extended attributes (see https://github.com/moby/moby/issues/45535, https://github.com/moby/moby/issues/45417), and older version of docker were silently discarding those errors; docker 25.0 will produce an error in this conditions (added in https://github.com/moby/moby/pull/45464), but other version may either discard that error (or perhaps BuildKit may not).
I'm unable to reproduce the error where it originally happened for me.
Here is docker info result from the last time the error happened
Client:
Version: 24.0.6
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2
Path: /usr/local/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.22.0
Path: /usr/local/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 24.0.6
Storage Driver: vfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7880925980b188f4c97b462f709d0db8e8962aff
runc version: v1.1.9-0-gccaecfc
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 4.18.0-477.15.1.el8_8.x86_64
Operating System: Alpine Linux v3.18 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 23.2GiB
Name: runner-jvxycqsx-project-160-concurrent-0-j[29](https://gitlab.domain.tld/project/-/jobs/24426#L29)qsi0x
ID: 179ff559-e2f4-4454-aa48-58d5e8bb7de2
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
This is indeed related to DinD. Disabling with DOCKER_BUILDKIT=0 can fix the issue, but then it takes ages to build the image.
I have a multistage Dockerfile that does a donet build and publish. Fails during dotnet publish.
FROM mcr.microsoft.com/dotnet/aspnet:7.0 AS base
ARG commitHash
WORKDIR /app
EXPOSE 80
EXPOSE 443
FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build
ARG commitHash
WORKDIR /src
COPY ["Directory.Build.props", "src/XXX/"]
COPY ["src/XXX/XXX.csproj", "src/XXX/"]
RUN dotnet restore "src/XXX/XXX.csproj"
COPY . .
WORKDIR "/src/src/XXX"
RUN dotnet build "XXX.csproj" -c Release -o /app/build /property:VersionSuffix=$commitHash
FROM build AS publish
ARG commitHash
RUN dotnet publish "XXX.csproj" -c Release -o /app/publish /p:UseAppHost=false /property:VersionSuffix=$commitHash
FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "XXX.dll"]
@ksharma-qc can you confirm if you are also using the vfs storage driver? That will help us to narrow down the underlying cause.
@ksharma-qc can you confirm if you are also using the vfs storage driver? That will help us to narrow down the underlying cause.
Actually mine is overlay. Here is the output from docker info from within the container.
Client: Docker Engine - Community
Version: 24.0.7
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.21.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 24.0.7
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
runc version: v1.1.9-0-gccaecfc
init version: de40ad0
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.15.0-88-generic
Operating System: Ubuntu 22.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 5.666GiB
Name: 163da1724e7e
ID: 6964b083-65b0-4aec-a6a5-a687f32d45c2
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
@ksharma-qc can you try running the docker daemon with the debug logs enabled? I believe it's -D or --debug. I'm curious if we'll get any log messages that could help narrow it down. Can I also confirm that you're getting an error message in the same format?
ERROR: failed to solve: failed to prepare <id> as <id>: invalid argument
This error message comes from https://github.com/moby/buildkit/blob/30a1b0e8b690b800eec0b2335b75a21b70c67eb7/cache/manager.go#L616-L627. The invalid argument makes me think that this is a system call that's failing although I'm not sure which one and why quite yet.
@ksharma-qc can you try running the docker daemon with the debug logs enabled? I believe it's
-Dor--debug. I'm curious if we'll get any log messages that could help narrow it down. Can I also confirm that you're getting an error message in the same format?ERROR: failed to solve: failed to prepare <id> as <id>: invalid argumentThis error message comes from https://github.com/moby/buildkit/blob/30a1b0e8b690b800eec0b2335b75a21b70c67eb7/cache/manager.go#L616-L627. The
invalid argumentmakes me think that this is a system call that's failing although I'm not sure which one and why quite yet.
I'm facing the same problem, same error message, I have a same version of docker 24.0.7. I will also try to enable the debug mod and send some other error message. It happens to me that after second failed docker builds the /var/lib/docker/vfs/dir folder contains up to 38 Gb, docker system prune -a doesn't help.
@ksharma-qc can you try running the docker daemon with the debug logs enabled? I believe it's
-Dor--debug. I'm curious if we'll get any log messages that could help narrow it down. Can I also confirm that you're getting an error message in the same format?ERROR: failed to solve: failed to prepare <id> as <id>: invalid argumentThis error message comes from https://github.com/moby/buildkit/blob/30a1b0e8b690b800eec0b2335b75a21b70c67eb7/cache/manager.go#L616-L627. The
invalid argumentmakes me think that this is a system call that's failing although I'm not sure which one and why quite yet.
I'll try running with debug logging tomorrow and post an update.
@ksharma-qc can you try running the docker daemon with the debug logs enabled? I believe it's
-Dor--debug. I'm curious if we'll get any log messages that could help narrow it down. Can I also confirm that you're getting an error message in the same format?ERROR: failed to solve: failed to prepare <id> as <id>: invalid argumentThis error message comes from https://github.com/moby/buildkit/blob/30a1b0e8b690b800eec0b2335b75a21b70c67eb7/cache/manager.go#L616-L627. The
invalid argumentmakes me think that this is a system call that's failing although I'm not sure which one and why quite yet.
I ran the docker daemon in debug mode and have attached the logs. I'm not sure if the problem was captured in the logs. The logs have a statement for RUN donet build on line 38. But nothing for succeeding instruction RUN donet publish which is where the error actually happens.
Is there a way to get more details?
Log file: docker-daemon.log
Console output from docker client:
#16 [build 8/8] RUN dotnet build "MyApp.csproj" -c Release -o /app/build /property:VersionSuffix=a1912aeb93aac7e2760da65fa44226ada0140599
#16 CACHED
#17 [publish 1/1] RUN dotnet publish "MyApp.csproj" -c Release -o /app/publish /p:UseAppHost=false /property:VersionSuffix=a1912aeb93aac7e2760da65fa44226ada0140599
#17 ERROR: failed to prepare zq1cnf4911ofmftez68uqepnz as h2wz441zubqnpt9awaz6hl30j: invalid argument
------
> [publish 1/1] RUN dotnet publish "MyApp.csproj" -c Release -o /app/publish /p:UseAppHost=false /property:VersionSuffix=a1912aeb93aac7e2760da65fa44226ada0140599:
------
Dockerfile:25
--------------------
23 | ARG commitHash
[docker-daemon.log](https://github.com/docker/buildx/files/13400190/docker-daemon.log)
24 |
25 | >>> RUN dotnet publish "MyApp.csproj" -c Release -o /app/publish /p:UseAppHost=false /property:VersionSuffix=$commitHash
26 |
27 | FROM base AS final
--------------------
ERROR: failed to solve: failed to prepare zq1cnf4911ofmftez68uqepnz as h2wz441zubqnpt9awaz6hl30j: invalid argument
The problem might not have been captured in the logs. It was an attempt just in case it was, but I don't see anything that's relevant. I also just wanted to double check if that was the full log file or if it was truncated. I'm also looking for some initialization debug messages such as these ones:
time="2023-11-20T19:51:10.837704761Z" level=debug msg="No quota support for local volumes in /home/rootless/.local/share/docker/volumes: Filesystem does not support, or has not enabled quotas"
time="2023-11-20T19:51:10.840596594Z" level=debug msg="[graphdriver] priority list: [overlay2 fuse-overlayfs btrfs zfs devicemapper vfs]"
time="2023-11-20T19:51:10.846727469Z" level=debug msg="successfully detected metacopy status" storage-driver=overlay2 usingMetacopy=false
time="2023-11-20T19:51:10.847878219Z" level=debug msg="backingFs=extfs, projectQuotaSupported=false, usingMetacopy=false, indexOff=\"index=off,\", userxattr=\"userxattr,\"" storage-driver=overlay2
time="2023-11-20T19:51:10.847890886Z" level=debug msg="Initialized graph driver overlay2"
It'll help me narrow down where it might be happening. We might need to add some additional debug messages in there if we don't see anything that helps and I'd prefer to make it more targeted if possible. In particular, the graphdriver message is pretty important as the potential area I'm looking at for the error uses the graphdriver and the specific implementation might be relevant.
I also just wanted to confirm that this was happening using the default docker builder. You can run docker buildx ls to show the list of builders and it'll also show the default one. If possible, I'd like you to try running the build with a custom builder and see if the same problem occurs.
$ docker buildx create --name stable --driver docker-container
$ docker buildx --builder stable build ...
I have the same issue with the following Dockerfile:
FROM mcr.microsoft.com/devcontainers/base:bullseye
RUN sh -c 'curl --silent --location https://git.io/JYfAY | bash'
RUN curl -L https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/bin sh
I need to set the DOCKER_BUILDKIT=0 to ensure the docker image building succeeds. Otherwise, I'll get following error and the image build fails:
....
=> [2/3] RUN sh -c 'curl --silent --location https://git.io/JYfAY | bash' 10.5s
=> ERROR [3/3] RUN curl -L https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/bin sh 0.4s
------
> [3/3] RUN curl -L https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/bin sh:
------
Dockerfile:4
--------------------
2 |
3 | RUN sh -c 'curl --silent --location https://git.io/JYfAY | bash'
4 | >>> RUN curl -L https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/bin sh
5 |
--------------------
ERROR: failed to solve: process "/bin/sh -c curl -L https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/bin sh" did not complete successfully: unable to find user root: invalid argument
Same error for docker:24.0.5
same here ! docker: 24.0.9
@kopwei
I also just wanted to confirm that this was happening using the default docker builder. You can run
docker buildx lsto show the list of builders and it'll also show the default one. If possible, I'd like you to try running the build with a custom builder and see if the same problem occurs.$ docker buildx create --name stable --driver docker-container $ docker buildx --builder stable build ...
I tried the above, in my case, the build seems to hang indefinitely on a prior step (in my case dpkg-configuring python-crypto). I waited for 3 minutes.
- Kernel version:
6.8.7-arch1-1 - Docker version:
26.1.0 - Buildx version
0.14.0
I noticed something possibly relevant:
# docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS BUILDKIT PLATFORMS
default* docker
\_ default \_ default running v0.13.1 linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/386
# docker buildx version
github.com/docker/buildx 0.14.0 171fcbeb69d67c90ba7f44f41a9e418f6a6ec1da
So, the version of the buildx plugin is 0.14.0 and the running buildx instance is 0.13.1? I tried updating the version of the instance, but no matter what, it wont update, I can't remove it.
@jsternberg I'm also on Arch Linux, like @ThorensTD124. I'm having the exact same issue. Docker is installed using pacman -S docker docker-buildx and the version of buildx mentioned by docker buildx version and the version of the default running instance shown with docker buildx ls differ (0.14.0, 0.13.1). Can this be the cause of this? Can I force/upgrade the default running instance? I've tried, but to no avail.
Update: I tested downgrading the docker-buildx package to 0.13.1, making the versions identical, no dice, still fails with the exact same error message as mentioned in this bug report.
Update 2: It turns out I'm running vfs storage driver, possibly due to casefold being enabled on my ext4 filesystem, and the overlay kernel module not supporting casefold. I'll try and re-layout my filesystems, re-enable overlay2 and re-test.
I'm testing with github.com/zimsneexh/Synergy-HL2-docker
# docker info | grep Storage
Storage Driver: vfs
# docker build -t synergy:latest .
[+] Building 94.2s (12/15) docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 575B 0.0s
=> [internal] load metadata for docker.io/library/debian:10 1.4s
=> [auth] library/debian:pull token for registry-1.docker.io 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [ 1/10] FROM docker.io/library/debian:10@sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920 4.5s
=> => resolve docker.io/library/debian:10@sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920 0.0s
=> => sha256:c94cce4c115de0a1328304e8981a68cf4b4b657ef6dbda52e188ac651368d603 1.46kB / 1.46kB 0.0s
=> => sha256:dbd6422b1b97494149e51bbd6c24d444b4a8794d2702d105efce98c44de9ad50 50.66MB / 50.66MB 1.9s
=> => sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920 984B / 984B 0.0s
=> => sha256:255eec9d157d35e00a81a45f1e958fd19437d504139e8eb4ea6cc380ea741ed4 529B / 529B 0.0s
=> => extracting sha256:dbd6422b1b97494149e51bbd6c24d444b4a8794d2702d105efce98c44de9ad50 2.5s
=> [internal] load build context 0.0s
=> => transferring context: 2.24kB 0.0s
=> [ 2/10] RUN apt update && apt-get install -y wine 33.7s
=> [ 3/10] RUN dpkg --add-architecture i386 1.1s
=> [ 4/10] RUN apt update 4.5s
=> [ 5/10] RUN apt-get -y install wine32 libwine libwine:i386 fonts-wine 28.4s
=> [ 6/10] RUN apt-get install -y lib32gcc1 curl xvfb screen procps winbind x11vnc 19.1s
=> ERROR [ 7/10] RUN rm -rf /steamcmd 1.4s
------
> [ 7/10] RUN rm -rf /steamcmd:
------
Dockerfile:7
--------------------
5 | RUN apt-get -y install wine32 libwine libwine:i386 fonts-wine
6 | RUN apt-get install -y lib32gcc1 curl xvfb screen procps winbind x11vnc
7 | >>> RUN rm -rf /steamcmd
8 | RUN mkdir steamcmd && cd steamcmd && curl -sqL "https://steamcdn-a.akamaihd.net/client/installer/steamcmd_linux.tar.gz" | tar zxvf -
9 | COPY start.sh /start.sh
--------------------
ERROR: failed to solve: failed to prepare mphp0b3tpyltya6v94nlmctkj as jwxdztyfhel5ozdf6xal7ap5z: invalid argument
@crazy-max @jsternberg I just finished re-layout'ing my storage in such a way that /var/lib/docker exists on a regular ext4 filesystem, notably without the casefold option:
# docker info | grep Storage
Storage Driver: overlay2
# docker build -t synergy:latest .
[+] Building 87.1s (16/16) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 575B 0.0s
=> [internal] load metadata for docker.io/library/debian:10 1.4s
=> [auth] library/debian:pull token for registry-1.docker.io 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 2.24kB 0.0s
=> [ 1/10] FROM docker.io/library/debian:10@sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920 2.3s
=> => resolve docker.io/library/debian:10@sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920 0.0s
=> => sha256:255eec9d157d35e00a81a45f1e958fd19437d504139e8eb4ea6cc380ea741ed4 529B / 529B 0.0s
=> => sha256:c94cce4c115de0a1328304e8981a68cf4b4b657ef6dbda52e188ac651368d603 1.46kB / 1.46kB 0.0s
=> => sha256:dbd6422b1b97494149e51bbd6c24d444b4a8794d2702d105efce98c44de9ad50 50.66MB / 50.66MB 1.3s
=> => sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920 984B / 984B 0.0s
=> => extracting sha256:dbd6422b1b97494149e51bbd6c24d444b4a8794d2702d105efce98c44de9ad50 0.9s
=> [ 2/10] RUN apt update && apt-get install -y wine 29.6s
=> [ 3/10] RUN dpkg --add-architecture i386 0.3s
=> [ 4/10] RUN apt update 6.8s
=> [ 5/10] RUN apt-get -y install wine32 libwine libwine:i386 fonts-wine 23.1s
=> [ 6/10] RUN apt-get install -y lib32gcc1 curl xvfb screen procps winbind x11vnc 15.8s
=> [ 7/10] RUN rm -rf /steamcmd 0.3s
=> [ 8/10] RUN mkdir steamcmd && cd steamcmd && curl -sqL "https://steamcdn-a.akamaihd.net/client/installer/steamcmd_linux.t 5.7s
=> [ 9/10] COPY start.sh /start.sh 0.0s
=> [10/10] RUN chmod +x /start.sh 0.3s
=> exporting to image 1.2s
=> => exporting layers 1.2s
=> => writing image sha256:9c26ec62a608c9db878b8133d9c8514adfedb32e019279183de16b95458a60aa 0.0s
=> => naming to docker.io/library/synergy:latest
So (at least in my case) it looks like vfs was the culprit, which got automatically enabled instead of overlay2 due to the underlying ext4 filesystem having the casefold option, which the overlay kernel module is incompatible with. vfs in turn seems to expose a bug in buildx.
I just attempted the build locally and didn't receive a build error.
Are you still experiencing the issue? If you are, can you try the build with the latest stable buildkit? You can do this by running the following:
$ docker buildx create --name=sandbox --driver=docker-container --bootstrap $ BUILDX_BUILDER=sandbox docker buildx build .
This workaround worked for me too, for my DinD Azure Pipeline Agent
$ docker buildx create --name=sandbox --driver=docker-container --bootstrap
$ BUILDX_BUILDER=sandbox docker buildx build --no-cache --load .
(somewhat catching up on this thread); thanks for correlating this with the casefold option!
So (at least in my case) it looks like
vfswas the culprit, which got automatically enabled instead ofoverlay2due to the underlying ext4 filesystem having thecasefoldoption, which the overlay kernel module is incompatible with.vfsin turn seems to expose a bug inbuildx.
The vfs driver is basically a "last resort" (if all else fails); it's implemented by creating a copy of all files in the image in a directory, then creating a copy of all file for every layer produced. So with the vfs there's no additional filesystem-type (like overlayFS) involved, and file-operations happen directly on the host's filesystem.
Overall, it would not surprise me if things would randomly fail on a case-insensitive filesystem; at least I'm aware of some tools/software using case-sensitive paths for caches, resulting in cache-collisions (cache files overwriting existing cache-files). So, even if things work in docker / buildkit itself, could still result in failures due to software running inside the container assuming a case-sensitive filesystem.
Not sure what difference would be between the classic builder and buildkit for this situation though 🤔
I wonder if (perhaps) BuildKit produces a case-sensitive (temp) paths for intermediate steps and those resulting in a similar collision; it likely hits a different code-path (BuildKit itself acts as a runtime for containers executed during build, and directly controls the OCI runtime, so it may / will hit different code-paths as the classic builder, which runs containers through the docker daemon).
There's also a possibility that the classic builder was hiding some underlying failures; the vfs driver (as mentioned) is used as a last-resort, there's historically been some changes to make it slightly more permissive on failures; older versions of docker did so for (extended attributes) unconditionally;
https://github.com/moby/moby/blob/v23.0.0/pkg/archive/archive.go#L791-L803
And current versions have this configurable (as silently discarding extended attributes could discard attributes relevant to the container, resulting in different results for some cases) https://github.com/moby/moby/blob/v27.4.1/pkg/archive/archive.go#L788-L803
I wonder if (e.g.) one of those or perhaps some other code-path would be ignoring errors when using vfs, potentially hiding errors related to failures due to the case-insensitive filesystem used.
Actually, wondering now, if BuildKit is running as a containerised builder, it won't use the graph-drivers from the docker codebase. The docker daemon (when using graph-drivers) performs detection whether a specific driver can be supported; if not supported, it falls back to other drivers (ultimately ending up on using vfs); https://github.com/moby/moby/blob/b5d5fef7aa68f160b817edef186c9b18ab996f4f/daemon/graphdriver/overlayutils/overlayutils.go#L40-L80
However, with BuildKit running standalone inside a container, it won't be using those storage-drivers, and I wonder if the containerd snapshotter code has a similar check. Possibly BuildKit itself (inside the container) continues to use the overlayFS snapshotter, which (as mentioned in https://github.com/docker/buildx/issues/2021#issuecomment-2081124556), does not support running on a case-insensitive filesystem, and now could fail when BuildKit creates containers for those build steps?
:point_up: @tonistiigi @jsternberg could that be the case here?