buildx icon indicating copy to clipboard operation
buildx copied to clipboard

Docker buildkit fails to build .net project in contrary to successful builds on depracted builder

Open incloudss opened this issue 2 years ago • 26 comments

Description

I found that when you have more than one dotnet publish in dockerfile, the docker build with buildkit fails with some strange, nondescriptive error: ERROR: failed to solve: failed to prepare 6boxvrjdjur378egamsa297vp as lnddt61dq57lwjio5fkmhme9e: invalid argument. When there is only one dotnet publish line, it works with buildkit.

This failing behavior does not exists when i turn off buildkit via DOCKER_BUILDKIT=0 docker build. It successfully builds image, despite having more than one dotnet publish command.

I attached minimal repro repository, and repro steps. Hope we can clarify of what is going on here.

Reproduce

  1. Download minimal repro repository https://github.com/incloudss/testbuildkit
  2. Go to testbuildkit directory and run docker build .
  3. The build fails.
  4. Run once again build, now with DOCKER_BUILDKIT=0 docker build ..
  5. Build succeeds.

Expected behavior

No response

docker version

Client: Docker Engine - Community
 Version:           24.0.5
 API version:       1.43
 Go version:        go1.20.6
 Git commit:        ced0996
 Built:             Fri Jul 21 20:35:35 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.6
  Git commit:       a61e2b4
  Built:            Fri Jul 21 20:35:35 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    24.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.20.2
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 24.0.5
 Storage Driver: vfs
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8165feabfdfe38c65b599c4993d227328c231fca
 runc version: v1.1.8-0-g82f18fe
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 5.4.225-200.el7.x86_64
 Operating System: Debian GNU/Linux 12 (bookworm) (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 31.36GiB
 Name: connect-build-agent-karas-5d84b54566-hkz8d
 ID: d76c3467-6972-423b-8208-7a5f12201c2b
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

No response

incloudss avatar Aug 25 '23 09:08 incloudss

Thanks for reporting; as this is related to build / buildkit, and the client-side implementation moved to buildx, I'll move this ticket to the buildx repository.

thaJeztah avatar Aug 25 '23 13:08 thaJeztah

https://github.com/incloudss/testbuildkit

FROM docker.repo.ihsmarkit.com/dotnet/sdk:6.0 AS build-env
WORKDIR /src
COPY [".nuget/NuGet.Config", "./"]
COPY . .
RUN ls /src
FROM build-env as publish

RUN ls /src
RUN dotnet publish "TestDockerBuildkit/TestDockerBuildkit.csproj" -c Release -o /publish/TestDockerBuildkit
RUN dotnet publish "TestDockerBuildkit2/TestDockerBuildkit2.csproj" -c Release -o /publish/TestDockerBuildkit2

Don't have access to docker.repo.ihsmarkit.com/dotnet/sdk:6.0 image to repro on our side. Can you change it to a public one?

crazy-max avatar Aug 25 '23 13:08 crazy-max

@crazy-max dockerfile fixed.

incloudss avatar Aug 28 '23 07:08 incloudss

@crazy-max do you need more information? if no, please remove the tag.

incloudss avatar Sep 01 '23 06:09 incloudss

I just attempted the build locally and didn't receive a build error.

build.log

Are you still experiencing the issue? If you are, can you try the build with the latest stable buildkit? You can do this by running the following:

$ docker buildx create --name=sandbox --driver=docker-container --bootstrap
$ BUILDX_BUILDER=sandbox docker buildx build .

jsternberg avatar Sep 27 '23 19:09 jsternberg

I have the same error but only in CI where I'm using DinD. It never throws when I build locally.

I don't have multiple publish commands but I the error occurs if I have multiple run statements that call dotnet.

I'm using the latest version of docker (24.0.6)

Docker throws an error with this file

FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build-env
WORKDIR /source

# Copy everything
COPY src/Project .

# Restore as distinct layers
RUN dotnet restore

# Build and publish a release
RUN dotnet publish ./Project.csproj -c Release --no-restore -o /app

# Build runtime image
FROM mcr.microsoft.com/dotnet/aspnet:7.0
WORKDIR /app
COPY --from=build-env /app .
EXPOSE 80
ENTRYPOINT ["./Project"]

But this file works without any issues

FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build-env
WORKDIR /source

# Copy everything
COPY src/Project .

# Build and publish a release
RUN dotnet publish ./Project.csproj -c Release -o /app

# Build runtime image
FROM mcr.microsoft.com/dotnet/aspnet:7.0
WORKDIR /app
COPY --from=build-env /app .
EXPOSE 80
ENTRYPOINT ["./Project"]

grimurd avatar Oct 05 '23 10:10 grimurd

Do we know what could be the cause of this problem? We are also facing a similar issue with dind images with buildkit enabled.

krishnadas-menon avatar Oct 11 '23 13:10 krishnadas-menon

@crazy-max dockerfile fixed.

I tried again on my side with the new Dockerfile and can't repro like @jsternberg:

$ docker build .
...
#6 [build-env 2/5] WORKDIR /src
#6 DONE 0.8s

#7 [build-env 3/5] COPY [.nuget/NuGet.Config, ./]
#7 DONE 0.1s

#8 [build-env 4/5] COPY . .
#8 DONE 0.1s

#9 [build-env 5/5] RUN ls /src
#9 0.380 Dockerfile
#9 0.380 NuGet.Config
#9 0.380 TestDockerBuildkit
#9 0.380 TestDockerBuildkit.sln
#9 0.380 TestDockerBuildkit2
#9 DONE 0.4s

#10 [publish 1/3] RUN ls /src
#10 0.518 Dockerfile
#10 0.518 NuGet.Config
#10 0.518 TestDockerBuildkit
#10 0.518 TestDockerBuildkit.sln
#10 0.518 TestDockerBuildkit2
#10 DONE 0.5s

#11 [publish 2/3] RUN dotnet publish "TestDockerBuildkit/TestDockerBuildkit.csproj" -c Release -o /publish/TestDockerBuildkit
#11 0.781 MSBuild version 17.3.2+561848881 for .NET
#11 1.066   Determining projects to restore...
#11 1.401 /src/TestDockerBuildkit/TestDockerBuildkit.csproj : warning NU1803: You are running the 'restore' operation with an 'HTTP' source, 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/'. Non-HTTPS access will be removed in a future version. Consider migrating to an 'HTTPS' source.
#11 1.465   Retrying 'FindPackagesByIdAsyncCore' for source 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/FindPackagesById()?id='Microsoft.Extensions.DependencyInjection'&semVerLevel=2.0.0'.
#11 1.465   Name or service not known (gda-packages.ihs.internal.corp:80)
#11 1.465     Name or service not known
#11 1.885   Retrying 'FindPackagesByIdAsyncCore' for source 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/FindPackagesById()?id='Microsoft.Extensions.DependencyInjection.Abstractions'&semVerLevel=2.0.0'.
#11 1.885   Name or service not known (gda-packages.ihs.internal.corp:80)
#11 1.885     Name or service not known
#11 2.168   Restored /src/TestDockerBuildkit/TestDockerBuildkit.csproj (in 780 ms).
#11 2.265 /src/TestDockerBuildkit/TestDockerBuildkit.csproj : warning NU1803: You are running the 'restore' operation with an 'HTTP' source, 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/'. Non-HTTPS access will be removed in a future version. Consider migrating to an 'HTTPS' source.
#11 3.332   TestDockerBuildkit -> /src/TestDockerBuildkit/bin/Release/net6.0/TestDockerBuildkit.dll
#11 3.349   TestDockerBuildkit -> /publish/TestDockerBuildkit/
#11 DONE 3.4s

#12 [publish 3/3] RUN dotnet publish "TestDockerBuildkit2/TestDockerBuildkit2.csproj" -c Release -o /publish/TestDockerBuildkit2
#12 0.635 MSBuild version 17.3.2+561848881 for .NET
#12 0.972   Determining projects to restore...
#12 1.219 /src/TestDockerBuildkit2/TestDockerBuildkit2.csproj : warning NU1803: You are running the 'restore' operation with an 'HTTP' source, 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/'. Non-HTTPS access will be removed in a future version. Consider migrating to an 'HTTPS' source.
#12 1.376   Restored /src/TestDockerBuildkit2/TestDockerBuildkit2.csproj (in 168 ms).
#12 1.500 /src/TestDockerBuildkit2/TestDockerBuildkit2.csproj : warning NU1803: You are running the 'restore' operation with an 'HTTP' source, 'http://gda-packages.ihs.internal.corp/nuget/nuget.org/'. Non-HTTPS access will be removed in a future version. Consider migrating to an 'HTTPS' source.
#12 2.570   TestDockerBuildkit2 -> /src/TestDockerBuildkit2/bin/Release/net6.0/TestDockerBuildkit2.dll
#12 2.588   TestDockerBuildkit2 -> /publish/TestDockerBuildkit2/
#12 DONE 2.6s

#13 exporting to image
#13 exporting layers
#13 exporting layers 0.2s done
#13 writing image sha256:4836dca5168c79dfece4c98fee5d7912f7a44d8a892e7824cc6f7731f6cc9734 done
#13 DONE 0.2s
$ docker info
Client: Docker Engine - Community
 Version:    24.0.6
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2-desktop.5
    Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.22.0-desktop.2
    Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.20
    Path:     /usr/local/lib/docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.8
    Path:     /usr/local/lib/docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-scan
  scout: Docker Scout (Docker Inc.)
    Version:  v1.0.7
    Path:     /usr/local/lib/docker/cli-plugins/docker-scout

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 21
 Server Version: 24.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8165feabfdfe38c65b599c4993d227328c231fca
 runc version: v1.1.8-0-g82f18fe
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
 Kernel Version: 5.15.90.1-microsoft-standard-WSL2
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 31.31GiB
 Name: docker-desktop
 ID: e9fa4e2f-dfe9-470c-b0dd-310710788fd2
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

@grimurd @krishnadas-menon If you can give a similar repro like https://github.com/incloudss/testbuildkit and output of docker info as well, that would help. Thanks.

crazy-max avatar Oct 11 '23 13:10 crazy-max

I wonder if Storage Driver: vfs is related to your issue. Ring a bell @thaJeztah?

crazy-max avatar Oct 11 '23 13:10 crazy-max

@crazy-max I was able to fix it by creating a new buildx context which updated the driver which was in use. Thanks for replying.

krishnadas-menon avatar Oct 11 '23 14:10 krishnadas-menon

I wonder if Storage Driver: vfs is related to your issue.

I would definitely NOT recommend using the vfs storage driver for any other purpose than debugging. The vfs storage driver creates a full copy of the image for every layer that's created, and for every container that's run (which is both very slow, and may cause a lot of disk space to be used).

There have been some cases where vfs may not be able to write extended attributes (see https://github.com/moby/moby/issues/45535, https://github.com/moby/moby/issues/45417), and older version of docker were silently discarding those errors; docker 25.0 will produce an error in this conditions (added in https://github.com/moby/moby/pull/45464), but other version may either discard that error (or perhaps BuildKit may not).

thaJeztah avatar Oct 11 '23 14:10 thaJeztah

I'm unable to reproduce the error where it originally happened for me.

Here is docker info result from the last time the error happened

Client:
 Version:    24.0.6
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2
    Path:     /usr/local/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.22.0
    Path:     /usr/local/libexec/docker/cli-plugins/docker-compose
Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 24.0.6
 Storage Driver: vfs
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7880925980b188f4c97b462f709d0db8e8962aff
 runc version: v1.1.9-0-gccaecfc
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 4.18.0-477.15.1.el8_8.x86_64
 Operating System: Alpine Linux v3.18 (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 23.2GiB
 Name: runner-jvxycqsx-project-160-concurrent-0-j[29](https://gitlab.domain.tld/project/-/jobs/24426#L29)qsi0x
 ID: 179ff559-e2f4-4454-aa48-58d5e8bb7de2
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

grimurd avatar Oct 11 '23 17:10 grimurd

This is indeed related to DinD. Disabling with DOCKER_BUILDKIT=0 can fix the issue, but then it takes ages to build the image.

I have a multistage Dockerfile that does a donet build and publish. Fails during dotnet publish.

FROM mcr.microsoft.com/dotnet/aspnet:7.0 AS base
ARG commitHash

WORKDIR /app
EXPOSE 80
EXPOSE 443

FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build
ARG commitHash

WORKDIR /src
COPY ["Directory.Build.props", "src/XXX/"]
COPY ["src/XXX/XXX.csproj", "src/XXX/"]
RUN dotnet restore "src/XXX/XXX.csproj"
COPY . .
WORKDIR "/src/src/XXX"
RUN dotnet build "XXX.csproj" -c Release -o /app/build /property:VersionSuffix=$commitHash

FROM build AS publish
ARG commitHash

RUN dotnet publish "XXX.csproj" -c Release -o /app/publish /p:UseAppHost=false /property:VersionSuffix=$commitHash

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "XXX.dll"]

ksharma-qc avatar Nov 15 '23 11:11 ksharma-qc

@ksharma-qc can you confirm if you are also using the vfs storage driver? That will help us to narrow down the underlying cause.

jsternberg avatar Nov 15 '23 19:11 jsternberg

@ksharma-qc can you confirm if you are also using the vfs storage driver? That will help us to narrow down the underlying cause.

Actually mine is overlay. Here is the output from docker info from within the container.

Client: Docker Engine - Community
 Version:    24.0.7
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.21.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 24.0.7
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
 runc version: v1.1.9-0-gccaecfc
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.0-88-generic
 Operating System: Ubuntu 22.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 5.666GiB
 Name: 163da1724e7e
 ID: 6964b083-65b0-4aec-a6a5-a687f32d45c2
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

ksharma-qc avatar Nov 16 '23 15:11 ksharma-qc

@ksharma-qc can you try running the docker daemon with the debug logs enabled? I believe it's -D or --debug. I'm curious if we'll get any log messages that could help narrow it down. Can I also confirm that you're getting an error message in the same format?

ERROR: failed to solve: failed to prepare <id> as <id>: invalid argument

This error message comes from https://github.com/moby/buildkit/blob/30a1b0e8b690b800eec0b2335b75a21b70c67eb7/cache/manager.go#L616-L627. The invalid argument makes me think that this is a system call that's failing although I'm not sure which one and why quite yet.

jsternberg avatar Nov 16 '23 17:11 jsternberg

@ksharma-qc can you try running the docker daemon with the debug logs enabled? I believe it's -D or --debug. I'm curious if we'll get any log messages that could help narrow it down. Can I also confirm that you're getting an error message in the same format?

ERROR: failed to solve: failed to prepare <id> as <id>: invalid argument

This error message comes from https://github.com/moby/buildkit/blob/30a1b0e8b690b800eec0b2335b75a21b70c67eb7/cache/manager.go#L616-L627. The invalid argument makes me think that this is a system call that's failing although I'm not sure which one and why quite yet.

I'm facing the same problem, same error message, I have a same version of docker 24.0.7. I will also try to enable the debug mod and send some other error message. It happens to me that after second failed docker builds the /var/lib/docker/vfs/dir folder contains up to 38 Gb, docker system prune -a doesn't help.

podzimekdavid avatar Nov 16 '23 18:11 podzimekdavid

@ksharma-qc can you try running the docker daemon with the debug logs enabled? I believe it's -D or --debug. I'm curious if we'll get any log messages that could help narrow it down. Can I also confirm that you're getting an error message in the same format?

ERROR: failed to solve: failed to prepare <id> as <id>: invalid argument

This error message comes from https://github.com/moby/buildkit/blob/30a1b0e8b690b800eec0b2335b75a21b70c67eb7/cache/manager.go#L616-L627. The invalid argument makes me think that this is a system call that's failing although I'm not sure which one and why quite yet.

I'll try running with debug logging tomorrow and post an update.

ksharma-qc avatar Nov 16 '23 19:11 ksharma-qc

@ksharma-qc can you try running the docker daemon with the debug logs enabled? I believe it's -D or --debug. I'm curious if we'll get any log messages that could help narrow it down. Can I also confirm that you're getting an error message in the same format?

ERROR: failed to solve: failed to prepare <id> as <id>: invalid argument

This error message comes from https://github.com/moby/buildkit/blob/30a1b0e8b690b800eec0b2335b75a21b70c67eb7/cache/manager.go#L616-L627. The invalid argument makes me think that this is a system call that's failing although I'm not sure which one and why quite yet.

I ran the docker daemon in debug mode and have attached the logs. I'm not sure if the problem was captured in the logs. The logs have a statement for RUN donet build on line 38. But nothing for succeeding instruction RUN donet publish which is where the error actually happens.

Is there a way to get more details?

Log file: docker-daemon.log

Console output from docker client:

#16 [build 8/8] RUN dotnet build "MyApp.csproj" -c Release -o /app/build /property:VersionSuffix=a1912aeb93aac7e2760da65fa44226ada0140599
#16 CACHED

#17 [publish 1/1] RUN dotnet publish "MyApp.csproj" -c Release -o /app/publish /p:UseAppHost=false /property:VersionSuffix=a1912aeb93aac7e2760da65fa44226ada0140599
#17 ERROR: failed to prepare zq1cnf4911ofmftez68uqepnz as h2wz441zubqnpt9awaz6hl30j: invalid argument
------
 > [publish 1/1] RUN dotnet publish "MyApp.csproj" -c Release -o /app/publish /p:UseAppHost=false /property:VersionSuffix=a1912aeb93aac7e2760da65fa44226ada0140599:
------
Dockerfile:25
--------------------
  23 |     ARG commitHash
[docker-daemon.log](https://github.com/docker/buildx/files/13400190/docker-daemon.log)

  24 |
  25 | >>> RUN dotnet publish "MyApp.csproj" -c Release -o /app/publish /p:UseAppHost=false /property:VersionSuffix=$commitHash
  26 |
  27 |     FROM base AS final
--------------------
ERROR: failed to solve: failed to prepare zq1cnf4911ofmftez68uqepnz as h2wz441zubqnpt9awaz6hl30j: invalid argument

ksharma-qc avatar Nov 18 '23 11:11 ksharma-qc

The problem might not have been captured in the logs. It was an attempt just in case it was, but I don't see anything that's relevant. I also just wanted to double check if that was the full log file or if it was truncated. I'm also looking for some initialization debug messages such as these ones:

time="2023-11-20T19:51:10.837704761Z" level=debug msg="No quota support for local volumes in /home/rootless/.local/share/docker/volumes: Filesystem does not support, or has not enabled quotas"
time="2023-11-20T19:51:10.840596594Z" level=debug msg="[graphdriver] priority list: [overlay2 fuse-overlayfs btrfs zfs devicemapper vfs]"
time="2023-11-20T19:51:10.846727469Z" level=debug msg="successfully detected metacopy status" storage-driver=overlay2 usingMetacopy=false
time="2023-11-20T19:51:10.847878219Z" level=debug msg="backingFs=extfs, projectQuotaSupported=false, usingMetacopy=false, indexOff=\"index=off,\", userxattr=\"userxattr,\"" storage-driver=overlay2
time="2023-11-20T19:51:10.847890886Z" level=debug msg="Initialized graph driver overlay2"

It'll help me narrow down where it might be happening. We might need to add some additional debug messages in there if we don't see anything that helps and I'd prefer to make it more targeted if possible. In particular, the graphdriver message is pretty important as the potential area I'm looking at for the error uses the graphdriver and the specific implementation might be relevant.

I also just wanted to confirm that this was happening using the default docker builder. You can run docker buildx ls to show the list of builders and it'll also show the default one. If possible, I'd like you to try running the build with a custom builder and see if the same problem occurs.

$ docker buildx create --name stable --driver docker-container
$ docker buildx --builder stable build ...

jsternberg avatar Nov 20 '23 19:11 jsternberg

I have the same issue with the following Dockerfile:

FROM mcr.microsoft.com/devcontainers/base:bullseye

RUN sh -c 'curl --silent --location https://git.io/JYfAY | bash'
RUN curl -L https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/bin sh

I need to set the DOCKER_BUILDKIT=0 to ensure the docker image building succeeds. Otherwise, I'll get following error and the image build fails:

....
 => [2/3] RUN sh -c 'curl --silent --location https://git.io/JYfAY | bash'                                                                                                 10.5s
 => ERROR [3/3] RUN curl -L https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/bin sh                                                                                    0.4s
------
 > [3/3] RUN curl -L https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/bin sh:
------
Dockerfile:4
--------------------
   2 |     
   3 |     RUN sh -c 'curl --silent --location https://git.io/JYfAY | bash'
   4 | >>> RUN curl -L https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/bin sh
   5 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c curl -L https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/bin sh" did not complete successfully: unable to find user root: invalid argument

kopwei avatar Dec 23 '23 16:12 kopwei

Same error for docker:24.0.5

dmdymov avatar Feb 12 '24 13:02 dmdymov

same here ! docker: 24.0.9

lipaysamart avatar Apr 23 '24 08:04 lipaysamart

@kopwei

I also just wanted to confirm that this was happening using the default docker builder. You can run docker buildx ls to show the list of builders and it'll also show the default one. If possible, I'd like you to try running the build with a custom builder and see if the same problem occurs.

$ docker buildx create --name stable --driver docker-container
$ docker buildx --builder stable build ...

I tried the above, in my case, the build seems to hang indefinitely on a prior step (in my case dpkg-configuring python-crypto). I waited for 3 minutes.

  • Kernel version: 6.8.7-arch1-1
  • Docker version: 26.1.0
  • Buildx version 0.14.0

I noticed something possibly relevant:

# docker buildx ls
NAME/NODE     DRIVER/ENDPOINT   STATUS    BUILDKIT   PLATFORMS
default*      docker
 \_ default    \_ default       running   v0.13.1    linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/386
# docker buildx version
github.com/docker/buildx 0.14.0 171fcbeb69d67c90ba7f44f41a9e418f6a6ec1da

So, the version of the buildx plugin is 0.14.0 and the running buildx instance is 0.13.1? I tried updating the version of the instance, but no matter what, it wont update, I can't remove it.

ThorensTD124 avatar Apr 27 '24 12:04 ThorensTD124

@jsternberg I'm also on Arch Linux, like @ThorensTD124. I'm having the exact same issue. Docker is installed using pacman -S docker docker-buildx and the version of buildx mentioned by docker buildx version and the version of the default running instance shown with docker buildx ls differ (0.14.0, 0.13.1). Can this be the cause of this? Can I force/upgrade the default running instance? I've tried, but to no avail.

Update: I tested downgrading the docker-buildx package to 0.13.1, making the versions identical, no dice, still fails with the exact same error message as mentioned in this bug report.

Update 2: It turns out I'm running vfs storage driver, possibly due to casefold being enabled on my ext4 filesystem, and the overlay kernel module not supporting casefold. I'll try and re-layout my filesystems, re-enable overlay2 and re-test.

I'm testing with github.com/zimsneexh/Synergy-HL2-docker

# docker info | grep Storage
 Storage Driver: vfs
# docker build -t synergy:latest .
[+] Building 94.2s (12/15)                                                                                           docker:default
 => [internal] load build definition from Dockerfile                                                                           0.0s
 => => transferring dockerfile: 575B                                                                                           0.0s
 => [internal] load metadata for docker.io/library/debian:10                                                                   1.4s
 => [auth] library/debian:pull token for registry-1.docker.io                                                                  0.0s
 => [internal] load .dockerignore                                                                                              0.0s
 => => transferring context: 2B                                                                                                0.0s
 => [ 1/10] FROM docker.io/library/debian:10@sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920           4.5s
 => => resolve docker.io/library/debian:10@sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920             0.0s
 => => sha256:c94cce4c115de0a1328304e8981a68cf4b4b657ef6dbda52e188ac651368d603 1.46kB / 1.46kB                                 0.0s
 => => sha256:dbd6422b1b97494149e51bbd6c24d444b4a8794d2702d105efce98c44de9ad50 50.66MB / 50.66MB                               1.9s
 => => sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920 984B / 984B                                     0.0s
 => => sha256:255eec9d157d35e00a81a45f1e958fd19437d504139e8eb4ea6cc380ea741ed4 529B / 529B                                     0.0s
 => => extracting sha256:dbd6422b1b97494149e51bbd6c24d444b4a8794d2702d105efce98c44de9ad50                                      2.5s
 => [internal] load build context                                                                                              0.0s
 => => transferring context: 2.24kB                                                                                            0.0s
 => [ 2/10] RUN apt update && apt-get install -y wine                                                                         33.7s
 => [ 3/10] RUN dpkg --add-architecture i386                                                                                   1.1s
 => [ 4/10] RUN apt update                                                                                                     4.5s
 => [ 5/10] RUN apt-get -y install wine32 libwine libwine:i386 fonts-wine                                                     28.4s
 => [ 6/10] RUN apt-get install -y lib32gcc1 curl xvfb screen procps winbind x11vnc                                           19.1s
 => ERROR [ 7/10] RUN rm -rf /steamcmd                                                                                         1.4s
------
 > [ 7/10] RUN rm -rf /steamcmd:
------
Dockerfile:7
--------------------
   5 |     RUN apt-get -y install wine32 libwine libwine:i386 fonts-wine
   6 |     RUN apt-get install -y lib32gcc1 curl xvfb screen procps winbind x11vnc
   7 | >>> RUN rm -rf /steamcmd
   8 |     RUN mkdir steamcmd && cd steamcmd && curl -sqL "https://steamcdn-a.akamaihd.net/client/installer/steamcmd_linux.tar.gz" | tar zxvf -
   9 |     COPY start.sh /start.sh
--------------------
ERROR: failed to solve: failed to prepare mphp0b3tpyltya6v94nlmctkj as jwxdztyfhel5ozdf6xal7ap5z: invalid argument

rubin55 avatar Apr 27 '24 18:04 rubin55

@crazy-max @jsternberg I just finished re-layout'ing my storage in such a way that /var/lib/docker exists on a regular ext4 filesystem, notably without the casefold option:

# docker info | grep Storage
 Storage Driver: overlay2
# docker build -t synergy:latest .
[+] Building 87.1s (16/16) FINISHED                                                                                  docker:default
 => [internal] load build definition from Dockerfile                                                                           0.0s
 => => transferring dockerfile: 575B                                                                                           0.0s
 => [internal] load metadata for docker.io/library/debian:10                                                                   1.4s
 => [auth] library/debian:pull token for registry-1.docker.io                                                                  0.0s
 => [internal] load .dockerignore                                                                                              0.0s
 => => transferring context: 2B                                                                                                0.0s
 => [internal] load build context                                                                                              0.0s
 => => transferring context: 2.24kB                                                                                            0.0s
 => [ 1/10] FROM docker.io/library/debian:10@sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920           2.3s
 => => resolve docker.io/library/debian:10@sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920             0.0s
 => => sha256:255eec9d157d35e00a81a45f1e958fd19437d504139e8eb4ea6cc380ea741ed4 529B / 529B                                     0.0s
 => => sha256:c94cce4c115de0a1328304e8981a68cf4b4b657ef6dbda52e188ac651368d603 1.46kB / 1.46kB                                 0.0s
 => => sha256:dbd6422b1b97494149e51bbd6c24d444b4a8794d2702d105efce98c44de9ad50 50.66MB / 50.66MB                               1.3s
 => => sha256:bce46a1c39574f98c845df4a5acc6c70c211df5a6182e428c1155c33317d4920 984B / 984B                                     0.0s
 => => extracting sha256:dbd6422b1b97494149e51bbd6c24d444b4a8794d2702d105efce98c44de9ad50                                      0.9s
 => [ 2/10] RUN apt update && apt-get install -y wine                                                                         29.6s
 => [ 3/10] RUN dpkg --add-architecture i386                                                                                   0.3s
 => [ 4/10] RUN apt update                                                                                                     6.8s
 => [ 5/10] RUN apt-get -y install wine32 libwine libwine:i386 fonts-wine                                                     23.1s
 => [ 6/10] RUN apt-get install -y lib32gcc1 curl xvfb screen procps winbind x11vnc                                           15.8s
 => [ 7/10] RUN rm -rf /steamcmd                                                                                               0.3s
 => [ 8/10] RUN mkdir steamcmd && cd steamcmd && curl -sqL "https://steamcdn-a.akamaihd.net/client/installer/steamcmd_linux.t  5.7s
 => [ 9/10] COPY start.sh /start.sh                                                                                            0.0s
 => [10/10] RUN chmod +x /start.sh                                                                                             0.3s
 => exporting to image                                                                                                         1.2s
 => => exporting layers                                                                                                        1.2s
 => => writing image sha256:9c26ec62a608c9db878b8133d9c8514adfedb32e019279183de16b95458a60aa                                   0.0s
 => => naming to docker.io/library/synergy:latest

So (at least in my case) it looks like vfs was the culprit, which got automatically enabled instead of overlay2 due to the underlying ext4 filesystem having the casefold option, which the overlay kernel module is incompatible with. vfs in turn seems to expose a bug in buildx.

rubin55 avatar Apr 30 '24 00:04 rubin55

I just attempted the build locally and didn't receive a build error.

build.log

Are you still experiencing the issue? If you are, can you try the build with the latest stable buildkit? You can do this by running the following:

$ docker buildx create --name=sandbox --driver=docker-container --bootstrap $ BUILDX_BUILDER=sandbox docker buildx build .

This workaround worked for me too, for my DinD Azure Pipeline Agent

$ docker buildx create --name=sandbox --driver=docker-container --bootstrap
$ BUILDX_BUILDER=sandbox docker buildx build --no-cache --load .

iurietopor avatar Dec 10 '24 14:12 iurietopor

(somewhat catching up on this thread); thanks for correlating this with the casefold option!

So (at least in my case) it looks like vfs was the culprit, which got automatically enabled instead of overlay2 due to the underlying ext4 filesystem having the casefold option, which the overlay kernel module is incompatible with. vfs in turn seems to expose a bug in buildx.

The vfs driver is basically a "last resort" (if all else fails); it's implemented by creating a copy of all files in the image in a directory, then creating a copy of all file for every layer produced. So with the vfs there's no additional filesystem-type (like overlayFS) involved, and file-operations happen directly on the host's filesystem.

Overall, it would not surprise me if things would randomly fail on a case-insensitive filesystem; at least I'm aware of some tools/software using case-sensitive paths for caches, resulting in cache-collisions (cache files overwriting existing cache-files). So, even if things work in docker / buildkit itself, could still result in failures due to software running inside the container assuming a case-sensitive filesystem.

Not sure what difference would be between the classic builder and buildkit for this situation though 🤔

I wonder if (perhaps) BuildKit produces a case-sensitive (temp) paths for intermediate steps and those resulting in a similar collision; it likely hits a different code-path (BuildKit itself acts as a runtime for containers executed during build, and directly controls the OCI runtime, so it may / will hit different code-paths as the classic builder, which runs containers through the docker daemon).

There's also a possibility that the classic builder was hiding some underlying failures; the vfs driver (as mentioned) is used as a last-resort, there's historically been some changes to make it slightly more permissive on failures; older versions of docker did so for (extended attributes) unconditionally; https://github.com/moby/moby/blob/v23.0.0/pkg/archive/archive.go#L791-L803

And current versions have this configurable (as silently discarding extended attributes could discard attributes relevant to the container, resulting in different results for some cases) https://github.com/moby/moby/blob/v27.4.1/pkg/archive/archive.go#L788-L803

I wonder if (e.g.) one of those or perhaps some other code-path would be ignoring errors when using vfs, potentially hiding errors related to failures due to the case-insensitive filesystem used.

thaJeztah avatar Dec 23 '24 10:12 thaJeztah

Actually, wondering now, if BuildKit is running as a containerised builder, it won't use the graph-drivers from the docker codebase. The docker daemon (when using graph-drivers) performs detection whether a specific driver can be supported; if not supported, it falls back to other drivers (ultimately ending up on using vfs); https://github.com/moby/moby/blob/b5d5fef7aa68f160b817edef186c9b18ab996f4f/daemon/graphdriver/overlayutils/overlayutils.go#L40-L80

However, with BuildKit running standalone inside a container, it won't be using those storage-drivers, and I wonder if the containerd snapshotter code has a similar check. Possibly BuildKit itself (inside the container) continues to use the overlayFS snapshotter, which (as mentioned in https://github.com/docker/buildx/issues/2021#issuecomment-2081124556), does not support running on a case-insensitive filesystem, and now could fail when BuildKit creates containers for those build steps?

:point_up: @tonistiigi @jsternberg could that be the case here?

thaJeztah avatar Dec 23 '24 10:12 thaJeztah