`docker inspect --bootstrap` ignores an error and exits successfully when one of two nodes fails
Contributing guidelines
- [x] I've read the contributing guidelines and wholeheartedly agree
I've found a bug and checked that ...
- [x] ... the documentation does not mention anything about my problem
- [x] ... there are no open or closed issues that are related to my problem
Description
In our CI we setup buildx builder with 2 nodes - arm64 and amd64 to build multi-arch docker images. CI runs docker inspect --bootstrap to check that both of them are up and ready. When one node fails (because of a timeout) the inspect command ignores it, prints the summary and completes with exit code 0.
Expected behaviour
docker inspect --bootstrap should exit with non-zero exit-code when one of the nodes fails to start. When command fails it's possible to catch a non-zero status code and retry bootstraping the nodes.
Actual behaviour
When single node fails the driver.Boot returns an error which is stored in the channel errCh. printer.Wait() never returns an error. So err is nil and len(errCh) == 1. So the last return statement is executed with true, nil. For more details see my gist with minimal reproducible example.
https://github.com/docker/buildx/blob/65c475647313b3edc502dcdd48b8ce4128b60dee/builder/builder.go#L178-L206
printer.Wait():
https://github.com/docker/buildx/blob/65c475647313b3edc502dcdd48b8ce4128b60dee/util/progress/printer.go#L39-L45
pw.err is assigned here
https://github.com/docker/buildx/blob/65c475647313b3edc502dcdd48b8ce4128b60dee/util/progress/printer.go#L133
but UpdateFrom never sets the error, except the ctx.Done case.
https://github.com/docker/buildx/blob/171fcbeb69d67c90ba7f44f41a9e418f6a6ec1da/vendor/github.com/moby/buildkit/util/progress/progressui/display.go#L80-L80
Buildx version
v0.14.0
Docker info
No response
Builders list
Configuration
Here is the link to this gist with code to reproduce bug in the (b *Builder) Boot() function:
https://gist.github.com/moleus/6bf1ad37f47dd86ed12f4c4345601e22
Build logs
+ docker buildx inspect --bootstrap
#1 [arm64-demo-jenkins-job-name internal] booting buildkit
#1 waiting for 1 pods to be ready
#1 ...
#2 [amd64-demo-jenkins-job-name internal] booting buildkit
#2 ...
#1 [arm64-demo-jenkins-job-name internal] booting buildkit
#1 waiting for 1 pods to be ready 93.3s done
#1 DONE 93.4s
#2 [amd64-demo-jenkins-job-name internal] booting buildkit
#2 waiting for 1 pods to be ready 109.5s done
#2 ERROR: expected 1 replicas to be ready, got 0
------
> [amd64-demo-jenkins-job-name internal] booting buildkit:
------
Name: back-demo-buildx <- *comment* this should not be printed. Should exit with 1
Driver: kubernetes
Last Activity: 2024-10-10 09:56:43 +0000 UTC
Nodes:
Name: amd64-demo-jenkins-job-name
Endpoint: kubernetes:///back-demo-buildx?deployment=amd64-demo-jenkins-job-name&kubeconfig=
Driver Options: loadbalance="random" namespace="demo-namespace" ...
Status: inactive
BuildKit daemon flags: --oci-worker-gc-keepstorage 8000 --allow-insecure-entitlement=network.host
Platforms: linux/amd64*
Name: arm64-demo-jenkins-job-name
Endpoint: kubernetes:///back-demo-buildx?deployment=arm64-demo-jenkins-job-name&kubeconfig=
Driver Options: loadbalance="random" namespace="demo-namespace" ...
Status: inactive
BuildKit daemon flags: --oci-worker-gc-keepstorage 8000 --allow-insecure-entitlement=network.host
Platforms: linux/arm64*
Additional info
I suggest replacing
if err == nil && len(errCh) == len(toBoot) { // errCh contains only one error -> condition fails
return false, <-errCh
}
return true, err // returns true, nil
with
if err == nil && len(errCh) > 0 {
return false, <-errCh
}
return true, err // returns true, nil
so inspect fails when one of the nodes fails to setup