DOCKER ERROR: failed to solve: frontend grpc server closed unexpectedly
I am using docker on an EC2 linux/arm64 instance Since my last docker upgrade, I have been getting the following error
DOCKER_BUILDKIT=1 docker build --ssh default -f
../containers/api/Dockerfile . -t api-test:latest
--target setup
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 493B done
#1 DONE 0.0s
#2 resolve image config for docker.io/docker/dockerfile:1.0.0-experimental
#2 DONE 0.6s
#3 docker-image://docker.io/docker/dockerfile:1.0.0-experimental@sha256:d2d402b6fa1dae752f8c688d72066a912d7042cc1727213f7990cdb57f60df0c
#3 CACHED Dockerfile:1 1 | >>> # syntax=docker/dockerfile:1.0.0-experimental 2 | 3 | # Build stage
ERROR: failed to solve: frontend grpc server closed unexpectedly make:
*** [run-tests] Error 1 Build step 'Execute shell' marked build as failure
The docker version
Client:
- Version: 25.0.3
- API version: 1.44
- Go version: go1.20.12 Git
- commit: 4debf41
- Built: Wed Feb 28 00:29:45 2024
- OS/Arch: linux/amd64
- Context: default
Server: Engine:
- Version: 25.0.3
- API version: 1.44 (minimum version 1.24)
- Go version: go1.20.12
- Git commit: f417435
- Built: Wed Feb 28 00:30:22 2024
- OS/Arch: linux/amd64
- Experimental: false containerd:
- Version: 1.3.2
- GitCommit: ff48f57fc83a8c44cf4ad5d672424a98ba37ded6
- runc: Version: 1.0.0-rc10
- GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
- docker-init: Version: 0.19.0
- GitCommit: de40ad0
On my M2 mac, it works fine with a newer version of docker
Docker version 26.1.4, build 5650f9b.
This is possibly a crash or sigkill of the frontend container. In that case you should have more logs about the cause in the daemon logs.
@tonistiigi thanks for the response.
Tried rerunning the job and got this from the logs:
journalctl -u docker.service
http2: server: error reading preface from client localhost: bogus greeting "Incorrect Usage: flag pr"
flag provided but not defined: -keep
level=error msg="failed to kill process in container id t89h1foczebcv4v0qpjvk1tey: runc did not terminate successfully: exit status 1: container \"t89h1fo
When i run journalctl -u docker.service | grep grpc this is logged repeatedly, this is logged multiple times
level=info msg="parsed scheme: \"\"" module=grpc
level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
level=info msg="ccResolverWrapper: sending update to cc: {[{ 0 <nil>}] <nil>}" module=grpc
level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
level=warning msg="grpc: addrConn.createTransport failed to connect to { 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing only one connection allowed\". Reconnecting..." module=grpc
I think it is an issue with wrong version of runc missing support for --keep flag. https://github.com/opencontainers/runc/pull/2825
Yes, this looks like a really odd setup; or at least running a very old version of containerd and runc,
- containerd:
- Version: 1.3.2
That's a 5 year old version; https://github.com/containerd/containerd/releases/tag/v1.3.2 And containerd 1.3 reached EOL in 2021; https://github.com/containerd/containerd/blob/main/RELEASES.md#support-horizon
- runc: Version: 1.0.0-rc10
Runc is also a very old pre-release from 2020; https://github.com/opencontainers/runc/releases/tag/v1.0.0-rc10
@tonistiigi May I ask why this has been closed? Is it fixed?
I don't think there's anything to fix. The problem (per the above) is due to a recent version of Docker and BuildKit used with an unsupported version of containerd and runc that don't provide the features as needed by BuildKit to function.
When trying to install docker-ce from docker's official packages, it should also produce an error, and reject if the version is not found; https://github.com/docker/docker-ce-packaging/blob/57eea5d683f09ea07da828198497276c513e8aea/deb/common/control#L26-L28
Package: docker-ce
Architecture: linux-any
Depends: containerd.io (>= 1.6.24),
But not sure how docker was installed (which may be from static binaries, which won't have that metadata)
Hi Team,
Thanks for the response. We are trying to upgrade a ill-maintained Jenkins server.
The current versions of runc and containerd are as follows:
$ runc --version
runc version 1.0.0-rc10
commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
spec: 1.0.1-dev
$ containerd --version
containerd github.com/containerd/containerd 1.3.2 ff48f57fc83a8c44cf4ad5d672424a98ba37ded6
I am assuming that I need to update containerd to 1.6.24 or above. What's the recommended runc version?
Thanks for the patience. Much appreciated.
current versions of the containerd.io package on download.docker.com is v1.7.22; e.g. for ubuntu noble; https://download.docker.com/linux/ubuntu/dists/noble/pool/stable/amd64/
The containerd.io package on download.docker.com contains both containerd and runc (v1.1.14)
Hi Team,
Upon upgrading the runc and containerd version, I am no longer getting the above error. The issue is resolved. Thanks for the help.
We're on Depot and all the versions of these various tools are pretty upgraded, so what ended up actually working for us was removing the syntax specifier completely. That way the parsing is all in-process in Go and no container is spun up at all. For context we had CI boxes that were running 20+ concurrent builds at once, and I think at some point the containers become non-responsive or just slow to start and hit the timeouts.
Obviously it's always nice to be explicit about the syntax overrides rather than relying on what buildkit has, but just want to make it clear for folks who run into this that you can always just remove the override completely. It's obviously not a requirement if the built in version suits your needs.