buildkit
buildkit copied to clipboard
[v0.13] OTEL socket mount requires new volumes when containerd is run in separate container
Steps to reproduce
- Create and target a kind cluster:
kind create cluster
- Clone gist
gh gist clone https://gist.github.com/georgethebeatle/487a6e99ab5f9dfc6be493729e0f426c
- From the root dir of the gist run:
kbld -f kbld.yaml -f manifest.yaml
What is expected to happen
The kbld command succeeds
What actually happens
The command fails with the follwing output:
repro | starting build (using kubectl buildkit): . -> kbld:rand-1710510549453549629-17512815236153-repro
repro | #1 [internal] booting buildkit
repro | #1 waiting for 1 pods to be ready for buildkit
repro | #1 0.537 Normal buildkit-6dd8f4bc7d SuccessfulCreate Created pod: buildkit-6dd8f4bc7d-z8swt
repro | #1 0.542 Normal buildkit-6dd8f4bc7d-z8swt Scheduled Successfully assigned default/buildkit-6dd8f4bc7d-z8swt to kind-control-plane
repro | #1 0.542 Warning buildkit-6dd8f4bc7d-z8swt FailedMount MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file
repro | #1 0.542 Warning initial attempt to deploy configured for the docker runtime failed, retrying with containerd
repro | #1 1.730 Normal buildkit-6f5667d48 SuccessfulCreate Created pod: buildkit-6f5667d48-nwhvn
repro | #1 1.735 Normal buildkit-6f5667d48-nwhvn Scheduled Successfully assigned default/buildkit-6f5667d48-nwhvn to kind-control-plane
repro | #1 1.735 Normal buildkit-6f5667d48-nwhvn Pulled Container image "docker.io/moby/buildkit:buildx-stable-1" already present on machine
repro | #1 1.735 Normal buildkit-6f5667d48-nwhvn Created Created container buildkitd
repro | #1 1.735 Normal buildkit-6f5667d48-nwhvn Started Started container buildkitd
repro | #1 waiting for 1 pods to be ready for buildkit 11.0s done
repro | #1 11.04 All 1 replicas for buildkit online
repro | #1 DONE 11.0s
repro |
repro | #2 [internal] load build definition from Dockerfile
repro | #2 transferring dockerfile: 470B done
repro | #2 DONE 0.0s
repro |
repro | #3 resolve image config for docker-image://docker.io/docker/dockerfile:experimental
repro | #3 DONE 0.7s
repro |
repro | #4 docker-image://docker.io/docker/dockerfile:experimental@sha256:600e5c62eedff338b3f7a0850beb7c05866e0ef27b2d2e8c02aa468e78496ff5
repro | #4 resolve docker.io/docker/dockerfile:experimental@sha256:600e5c62eedff338b3f7a0850beb7c05866e0ef27b2d2e8c02aa468e78496ff5 0.0s done
repro | #4 CACHED
repro | Error: failed to solve: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/run/buildkit/otel-grpc.sock" to rootfs at "/dev/otel-grpc.sock": stat /run/buildkit/otel-grpc.sock: no such file or directory: unknown
repro | error: exit status 1
repro | finished build (using kubectl buildkit)
kbld: Error:
- Resolving image 'repro': exit status 1
Debugging
We were able to bisect that this PR is introducing the failure
It should be fixed with https://github.com/moby/buildkit/pull/4619
Can you try with moby/buildkit:latest
? (v0.13.0)
Hey @crazy-max
moby/buildkit:latest
(or moby/buildkit:0.13.0
) does not seem to be a public docker image. Tehrefore we just checked out v0.13.0
, built the images ourselves and side loaded them in kind. Unfortunately the build failed in the same manner.
PS: We get the same failure with master
as well.
moby/buildkit:latest
(ormoby/buildkit:0.13.0
) does not seem to be a public docker image.
hum these tags exist:
- https://explore.ggcr.dev/?image=moby%2Fbuildkit%3Alatest
- https://explore.ggcr.dev/?image=moby%2Fbuildkit%3Av0.13.0
@AkihiroSuda
We are using buildkit in the context of a Tiltfile extension kubectl_build
, which hides a lot of possible settings. It uses the buildx-stable-1
image and defaults to adding to the CLI --oci-worker=false
and --containerd-worker=true
.
I've verified in this setup that the current buildx-stable-1
, which matches SHA with v0.13.1, doesn't fix this issue if the oci-worker is not enabled, but doesn't repro if I forcibly enable the worker in the container spec.
@brettmorien Are you using some kind of configuration where containerd is running in a different container/rootfs than buildkit?
We are using this extension for Tilt for local development: https://github.com/tilt-dev/tilt-extensions/tree/master/kubectl_build
How that's translated through all the layers of stuff is a container spec that looks like:
template:
metadata:
labels:
app: buildkit
rootless: "false"
runtime: containerd
worker: containerd
spec:
containers:
- args:
- --oci-worker=false
- --containerd-worker=true
- --root
- /var/lib/buildkit/buildkit
image: docker.io/moby/buildkit:buildx-stable-1
@brettmorien Looks like that setup is indeed running container and buildkit in separate containers and setting up some volume mounts in between them. https://github.com/vmware-archive/buildkit-cli-for-kubectl/blob/1db649b1f50268d857d0cfd36335800c72d2cf50/pkg/driver/kubernetes/manifest/manifest.go#L178-L209 With the OTEL socket being in /run/buildkit
that would need to be exposed to the other container as well if the buildkit container is not the one launching containers.
@georgethebeatle is your setup somewhat similar to @brettmorien 's ?
@tonistiigi we are using the buildkit cli for kubectl configuration of kbld
that seems to be boiling down to https://github.com/vmware-archive/buildkit-cli-for-kubectl - the same thing that @brettmorien ends up using, so I guess the setup is effectively the same. Are you suggesting that this is a bug in buildkit-cli-for-kubectl
?
FWIW I tried the same build with a patched version of buildkit cli for kubectl
to no avail. I added the following volume and mounts to the code highlighted above:
corev1.Volume{
Name: "run-buildkit",
VolumeSource: corev1.VolumeSource{
HostPath: &corev1.HostPathVolumeSource{
Path: "/var/run/buildkit",
Type: &hostPathDirectory,
},
},
},
//...
corev1.VolumeMount{
Name: "run-buildkit",
MountPath: "/run/buildkit",
MountPropagation: &mountPropagationBidirectional,
},
Then I started getting this error:
cloudfoundry/korifi-controllers | starting build (using kubectl buildkit): . -> trinity.common.repositories.cloud.sap/trinity/korifi-controllers:rand-1711010529163389687-228249185235254-cloudfoundry-korifi-controllers
cloudfoundry/korifi-api | #1 [internal] booting buildkit
cloudfoundry/korifi-controllers | #1 [internal] booting buildkit
cloudfoundry/korifi-api | #1 waiting for 1 pods to be ready for buildkit
cloudfoundry/korifi-controllers | #1 waiting for 1 pods to be ready for buildkit
cloudfoundry/korifi-controllers | #1 0.005 Warning failed to create configmap configmaps "buildkit" already exists - retrying...
cloudfoundry/korifi-api | #1 0.540 Normal buildkit-6dd8f4bc7d SuccessfulCreate Created pod: buildkit-6dd8f4bc7d-czbrk
cloudfoundry/korifi-api | #1 0.544 Normal buildkit-6dd8f4bc7d-czbrk Scheduled Successfully assigned default/buildkit-6dd8f4bc7d-czbrk to trinity-control-plane
cloudfoundry/korifi-api | #1 0.544 Warning buildkit-6dd8f4bc7d-czbrk FailedMount MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file
cloudfoundry/korifi-api | #1 0.544 Warning initial attempt to deploy configured for the docker runtime failed, retrying with containerd
cloudfoundry/korifi-controllers | #1 0.901 Normal buildkit-59dcd4b999 SuccessfulCreate Created pod: buildkit-59dcd4b999-m6wq6
cloudfoundry/korifi-controllers | #1 0.905 Normal buildkit-59dcd4b999-m6wq6 Scheduled Successfully assigned default/buildkit-59dcd4b999-m6wq6 to trinity-control-plane
cloudfoundry/korifi-controllers | #1 0.905 Warning buildkit-59dcd4b999-m6wq6 FailedMount MountVolume.SetUp failed for volume "run-buildkit" : hostPath type check failed: /var/run/buildkit is not a directory
cloudfoundry/korifi-api | #1 1.407 Normal buildkit-59dcd4b999 SuccessfulCreate Created pod: buildkit-59dcd4b999-m6wq6
cloudfoundry/korifi-api |
cloudfoundry/korifi-api | #1 1.412 Normal buildkit-59dcd4b999-m6wq6 Scheduled Successfully assigned default/buildkit-59dcd4b999-m6wq6 to trinity-control-plane
cloudfoundry/korifi-api | #1 1.412 Warning buildkit-59dcd4b999-m6wq6 FailedMount MountVolume.SetUp failed for volume "run-buildkit" : hostPath type check failed: /var/run/buildkit is not a directory
When I shell into the kind cluster's docker container I cannot find an otel sock anywhere (/var/run/buildkit
does not exist indeed)
Another interesting finding is that when I try to run the same build against a remote kubernetes cluster (with the original untampered version of the buildkit cli for kubectl) the build runs successfully. When I shell in the buildkit pod I can see that /run/buildkit/otel-grpc.sock
exists. So it looks like there is some difference between kind and non-kind with regards to the otel socket.
@tonistiigi are you sure that the cause is running containerd in separate container? If so wouldn't it also reproduce on remote clusters? For us it only happens on kind.
Closing since this doesn't appear to be directly a buildkit error and related to the outdated tool buildkit-cli-for-kubectl configuring volumes for deploying buildkit