buildkit-cli-for-kubectl icon indicating copy to clipboard operation
buildkit-cli-for-kubectl copied to clipboard

kubectl build gets stuck indefinitely

Open nicks opened this issue 4 years ago • 5 comments

What steps did you take and what happened

Sometimes kubectl build gets stuck and never recovers.

As far as I can tell, what's happening is that it's waiting indefinitely for the buildkit server to deploy.

What did you expect to happen

At the bare minimum, the build should timeout and fail if it takes too long for the buildkit server to start.

Environment Details:

$ kubectl buildkit version
Client:  v0.1.4
Builder: buildkitd github.com/moby/buildkit v0.9.2 a14b4e097ae1dc7514c5febd6d75f742a166ea75
  • Kind v0.9.1 (running Kubernetes v1.21.1 and containerd)

Builder Logs

https://app.circleci.com/pipelines/github/tilt-dev/tilt-example-builders/13/workflows/a73afc07-73e5-4f38-9419-f944f2844b54/jobs/14

Here are the relevant bits:

example-pyth… │ Running custom build cmd "kubectl build --context kind-kind -f ./Dockerfile --registry-secret docker-registry --build-arg flask_env=development -t $EXPECTED_REF ."
example-pyth… │ #1 [internal] booting buildkit
example-pyth… │ #1 waiting for 1 pods to be ready for buildkit
example-pyth… │ #1 0.320 Normal 	buildkit-7c58458f5f 	SuccessfulCreate 	Created pod: buildkit-7c58458f5f-zlf6b
example-pyth… │ #1 0.411 Normal 	buildkit-7c58458f5f-zlf6b 	Scheduled 	Successfully assigned default/buildkit-7c58458f5f-zlf6b to kind-control-plane
example-pyth… │ #1 0.411 Warning 	buildkit-7c58458f5f-zlf6b 	FailedMount 	MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file
example-pyth… │ #1 0.411 Warning 	initial attempt to deploy configured for the docker runtime failed, retrying with containerd
example-pyth… │ #1 123.5 Warning 	buildkit-7c58458f5f-zlf6b 	FailedMount 	Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[buildkitd-config docker-sock kube-api-access-wd96c]: timed out waiting for the condition
example-pyth… │ #1 397.9 Warning 	buildkit-7c58458f5f-zlf6b 	FailedMount 	Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[kube-api-access-wd96c buildkitd-config docker-sock]: timed out waiting for the condition
example-pyth… │ #1 534.7 Warning 	buildkit-7c58458f5f-zlf6b 	FailedMount 	Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[docker-sock kube-api-access-wd96c buildkitd-config]: timed out waiting for the condition
Error: context canceled

Too long with no output (exceeded 10m0s): context deadline exceeded

Note that it's eventually getting killed by CircleCI

Dockerfile

https://github.com/tilt-dev/tilt-example-builders/blob/main/kubectl_build/Dockerfile

Vote on this request

This is an invitation to the community to vote on issues. Use the "smiley face" up to the right of this comment to vote.

  • :+1: "I would like to see this bug fixed as soon as possible"
  • :-1: "There are more important bugs to focus on right now"

nicks avatar Nov 09 '21 17:11 nicks

anecdotally: if i change my setup script to do

kubectl buildkit create --runtime=containerd

rather than relying on the auto-setup and runtime=auto detection, it seems to work consistently. (though the fundamental bug seems to be a race condition, so i might just be getting lucky.)

nicks avatar Nov 09 '21 19:11 nicks

PR #106 revamps some of the relevant code paths, which hopefully will improve or fix this. It may take us a little while to get that merged, but once we do, we'll most likely cut a new minor release (v0.2.0) which we can then re-test in your scenario to see if it fully solves the problem, improves it, or makes no difference.

dhiltgen avatar Nov 22 '21 23:11 dhiltgen

Thanks @dhiltgen ! We'll test out v0.2.0 once it's out to see if it helps.

nicks avatar Nov 22 '21 23:11 nicks

@nicks curious if you had a chance to eval that release?

tmc avatar Jan 25 '22 23:01 tmc

@tmc i don't think v0.2 is out yet? but the kubectl buildkit create --runtime=containerd workaround has been working well for us, we haven't seen a failure since we added it

nicks avatar Jan 26 '22 01:01 nicks