buildkit-cli-for-kubectl icon indicating copy to clipboard operation
buildkit-cli-for-kubectl copied to clipboard

k3d: unable to start built container

Open juliostanley opened this issue 4 years ago • 13 comments

What steps did you take and what happened

  • Took a Dockerfile from some random project
  • Built it just fine (caching seems to be working well). kubectl build . -t test image
  • Attempted to run it with kubectl run -i --tty test --image=test -- sh
  • Got image pull back image
  • The image does not appear to be there, compared to the gif animation on the readme
    • It didn't send the tar? (the last export line is the last entry on the output, see image above)

What did you expect to happen Container starts on single node k8s

Environment Details:

  • kubectl buildkit 0.1.0
  • Kubernetes v1.18.8+k3s1
  • k3s via k3d, on Docker for Windows (wsl2)
  • containerd v1.3.3-k3s2

Builder Logs

[+] Building 1.3s (9/9) FINISHED
 => [internal] load .dockerignore                                                                            0.1s 
 => [internal] load build definition from Dockerfile                                                         0.1s 
 => => transferring dockerfile: 32B                                                                          0.0s 
 => [internal] load metadata for docker.io/library/ubuntu:18.04                                              0.8s 
 => [1/5] FROM docker.io/library/ubuntu:18.04@sha256:646942475da61b4ce9cc5b3fadb42642ea90e5d0de46111458e100  0.0s 
 => => resolve docker.io/library/ubuntu:18.04@sha256:646942475da61b4ce9cc5b3fadb42642ea90e5d0de46111458e100  0.0s 
 => CACHED [2/5] RUN apt-get update; apt-get install wget -y                                                 0.0s 
 => CACHED [3/5] RUN  wget https://bin.equinox.io/c/VdrWdbjqyF/cloudflared-stable-linux-amd64.deb            0.0s 
 => CACHED [4/5] RUN  apt-get install ./cloudflared-stable-linux-amd64.deb                                   0.0s 
 => CACHED [5/5] RUN  useradd -s /usr/sbin/nologin -r -M cloudflared;rm -rf cloudflared-stable-linux-amd64.  0.0s 
 => exporting to image                                                                                       0.3s 
 => => exporting layers                                                                                      0.1s 
 => => exporting manifest sha256:eaf698c4058f45b5e9ba84ba994b269f6289ec5a5ba7b7b949c6b6cb0c6ec27d            0.0s 
 => => exporting config sha256:d9349451d21751b2bb3600b1ac00ecc5e4d53056fad7a01732aff58bf2fd5eeb              0.1s 

Dockerfile N/A happens with any simple file

Vote on this request

This is an invitation to the community to vote on issues. Use the "smiley face" up to the right of this comment to vote.

  • :+1: "I would like to see this bug fixed as soon as possible"
  • :-1: "There are more important bugs to focus on right now"

juliostanley avatar Nov 18 '20 18:11 juliostanley

Where can I increase the log level, or any tips on debugging?

juliostanley avatar Nov 18 '20 18:11 juliostanley

Can you rebuild but tag it something like test:mytest ? I think this may be running into a problem w/ the semantics of how kubectl run works where it will always attempt to pull an image if it's untagged or set to latest.

pdevine avatar Nov 18 '20 18:11 pdevine

For the default buildkit pod created kubectl logs buildkit-5b5d76d554-smdpc

The logs are

time="2020-11-18T16:32:50Z" level=info msg="auto snapshotter: using overlayfs"
time="2020-11-18T16:32:50Z" level=warning msg="using host network as the default"
time="2020-11-18T16:32:50Z" level=info msg="found worker \"lm0xod5ma4sm55gh6k87rtfpn\", labels=map[org.mobyproject.buildkit.worker.executor:oci org.mobyproject.buildkit.worker.hostname:buildkit-5b5d76d554-smdpc org.mobyproject.buildkit.worker.snapshotter:overlayfs], platforms=[linux/amd64 linux/arm64 linux/riscv64 linux/ppc64le 
linux/s390x linux/386 linux/arm/v7 linux/arm/v6]"
time="2020-11-18T16:32:50Z" level=warning msg="skipping containerd worker, as \"/run/containerd/containerd.sock\" does not exist"
time="2020-11-18T16:32:50Z" level=info msg="found 1 workers, default=\"lm0xod5ma4sm55gh6k87rtfpn\""
time="2020-11-18T16:32:50Z" level=warning msg="currently, only the default worker can be used."
time="2020-11-18T16:32:50Z" level=info msg="running server on /run/buildkit/buildkitd.sock"

I looked at k3s node, which has containerd sock in a different location than expected?

⤷ lab  master > docker exec -it 72d sh
/ # ls -la /run/k3s/containerd
total 0
drwx--x--x 5 0 0 140 Nov 18 16:14 .
drwx--x--x 3 0 0  60 Nov 18 16:14 ..
srw-rw---- 1 0 0   0 Nov 18 16:14 containerd.sock
srw-rw---- 1 0 0   0 Nov 18 16:14 containerd.sock.ttrpc
drwxr-xr-x 4 0 0  80 Nov 18 16:14 io.containerd.grpc.v1.cri
drwx--x--x 2 0 0  40 Nov 18 16:14 io.containerd.runtime.v1.linux
drwx--x--x 3 0 0  60 Nov 18 16:14 io.containerd.runtime.v2.task

juliostanley avatar Nov 18 '20 18:11 juliostanley

Gave it a try with a tag, but still same effect


kubectl build . -t test:test -f .\Dockerfile.test

 => [internal] load .dockerignore                                                                            0.1s 
 => => transferring context: 2B                                                                              0.0s 
 => [internal] load build definition from Dockerfile.test                                                    0.1s 
 => => transferring dockerfile: 36B                                                                          0.0s 
 => [internal] load metadata for docker.io/library/alpine:latest                                             1.1s 
 => [1/2] FROM docker.io/library/alpine@sha256:c0e9560cda118f9ec63ddefb4a173a2b2a0347082d7dff7dc14272e7841a  0.0s 
 => => resolve docker.io/library/alpine@sha256:c0e9560cda118f9ec63ddefb4a173a2b2a0347082d7dff7dc14272e7841a  0.0s 
 => CACHED [2/2] RUN echo hi                                                                                 0.0s 
 => exporting to image                                                                                       0.1s 
 => => exporting layers                                                                                      0.0s 
 => => exporting manifest sha256:4d46a8b555ef3122328fdb79c2de6fe60f4b7126413912eea2dd8012a594efbc            0.1s 
 => => exporting config sha256:819a744a827582dee9e9e319081273666fbc26e8349df5ae6f605c003d1a4adb              0.0s

kubectl run -i --tty test --image=test:test -- sh


kubectl get po test -o yaml

....
spec:
  containers:
  - args:
    - sh
    image: test:test
    imagePullPolicy: IfNotPresent
    name: test
    resources: {}
    stdin: true
    stdinOnce: true
....

kubectl get po

test                        0/1     ImagePullBackOff   0          2m11s

juliostanley avatar Nov 18 '20 18:11 juliostanley

Does the image get loaded back into docker when you run docker images?

pdevine avatar Nov 18 '20 18:11 pdevine

I have the same problem when using microk8s. Build works fine but image is not uploaded to containerd. I tried settings containerd.sock location:

kubectl buildkit create --runtime containerd --containerd-sock=/var/snap/microk8s/common/run/containerd.sock;

But after this even building started to fail

MarcusAhlfors avatar Nov 18 '20 18:11 MarcusAhlfors

@MarcusAhlfors could you file a separate issue for microk8s? We've tested it out on a lot of platforms, but clearly not enough! :-D

pdevine avatar Nov 18 '20 18:11 pdevine

I am running k3s, which uses containerd.

No, the image does not get loaded into containerd.


If I follow the flags used by @MarcusAhlfors with kubectl buildkit create --runtime containerd --containerd-sock=/run/k3s/containerd/containerd.sock;

I get the following error

[+] Building 0.0s (0/1)
 => [internal] booting buildkit                                                                             23.1s 
 => => # Normal buildkit-54df6654c9 SuccessfulCreate Created pod: buildkit-54df6654c9-n4rpl
 => => # Normal buildkit-54df6654c9-n4rpl Scheduled Successfully assigned default/buildkit-54df6654c9-n4rpl to k3 
 => => # d-k3s-default-server-0
 => => # Warning buildkit-54df6654c9-n4rpl FailedMount MountVolume.SetUp failed for volume "var-lib-containerd" : 
 => => #  hostPath type check failed: /var/lib/containerd is not a directory
 => => waiting for 1 pods to be ready 

events

  Warning  FailedMount  95s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[var-lib-containerd], unattached volumes=[var-lib-containerd run-containerd var-log tmp default-token-s7jfq buildkitd-config containerd-sock var-lib-buildkit]: timed out waiting for the condition

juliostanley avatar Nov 18 '20 18:11 juliostanley

Regarding the previous error, I think the mount should be configurable? I modified the deployment of buildkit while in failed state:

  • replica to 0
  • var-lib-containerd to hostPath /var/lib/rancher/k3s/agent/containerd
  • replica to 1

But I ended in this error on pod events

Error: failed to generate container "e3c0f6613345a1a9d493a557951288c03e83ed22753aa52c51bd4d3b6388fcc8" spec: path "/tmp" is mounted on "/" but it is not a shared mount

Which seems similar to this https://github.com/rancher/k3d/issues/206

Which is caused by Bidirectional setting to allow for the mounts to be picked up.

I guess, maybe k3d isn't a good env for kubectl-buildkit? Suggestions?

juliostanley avatar Nov 18 '20 19:11 juliostanley

As a work-around you can --push the built image to a registry and then pull it back, but that's definitely not ideal. We'll need to see if we can get an environment to see if we can replicate the problem.

pdevine avatar Nov 18 '20 19:11 pdevine

@juliostanley you mention in the opening comment Docker for Windows (wsl2) and the containerd runtime. I'm trying to wrap my head around what moving parts are in your environment (maybe some of the auto-detection logic is getting confused.)

Is your kubelet configured to use containerd or dockerd? (I'm assuming containerd, but please confirm)

Is dockerd also running inside your system, and is there a /var/run/docker.sock visible inside the wsl2 distro that's running the kubelet?

If there is a dockerd, and you're using containerd for kubernetes, it's possible the builder is auto-selecting dockerd incorrectly, assuming that's your runtime, then storing images there, which are not visible to kubernetes via containerd.

If that's what's going on, using kubectl buildkit create --runtime containerd --containerd-sock ... it should be possible to work around this auto-detection glitch by forcing the right runtime and containerd socket path, but the trick will be figuring out what the path is inside the environment that is running the kubelet process. If you can find your kubelet config, hopefully that lists the path to the containerd socket for reference.

dhiltgen avatar Nov 18 '20 21:11 dhiltgen

@dhiltgen Yeah, it may sound a little confusing, and its actually is part of the issue, due to the need for Bidirectional mounts.


So here is what I noticed (based on my previous comments)

  • I am using k3d https://k3d.io/
  • k3d uses docker (in this case docker for windows - wsl2) to spin up "nodes" or single a "node" for k8s cluster, these run k3s which comes with containerd (its not a docker in docker situation). The sock and the lib locations for containerd are on non standard paths.
    • This is the command to create the k8s cluster k3d cluster create
    • var-lib-containerd is at /var/lib/rancher/k3s/agent/containerd inside the docker container node
    • sock is at /run/k3s/containerd/containerd.sock
  • I can create buildkit with kubectl buildkit create --runtime containerd --containerd-sock=/run/k3s/containerd/containerd.sock
  • But it will still fails, because of 2 reasons
    • var-lib-containerd is assumed to be in /var/lib/containerd, which causes pod creation failure due to non existing hostPath (the following bullet point is even after I manually edit the deployment of buildkit)
    • The hostPath on the deployment are set to be Bidirectional, which actually throws an error regarding not being shared mounts (see this ticket for reference https://github.com/rancher/k3d/issues/206), different issue but same pattern. Pod errors failing to create buildkit container due to mounts.

Basically it seems like k3d is not a good environment for kubectl-buildkit, and the only option for it is if you are using a registry, as described by @pdevine, although that eliminates one of the use cases (not transferring bytes up and down from the registry, and needing a registry)

Hope this clarified the environment

juliostanley avatar Nov 18 '20 21:11 juliostanley

Thanks for the clarification!

The way containerd works is the gRPC API requires the "client" to be "local" - it's not a network API like the kubernetes API or dockerd API. The client libraries require access to specific host paths so that files can be placed there so child containers can access them, hence the bidirectional mounts. This is only needed if we're using containerd to facilitate the containers used during the image build.

It sounds like k3d isn't going to work unless/until those mounts are refined upstream for the containerd runtime model.

It's possible #26 might wind up building out an alternative strategy which could be employed here. We might be able to approach this by using the ~rootless model (not building inside containerd) then load the images directly through a proxy which I believe can load images purely over the containerd.sock without having to touch the filesystem.

dhiltgen avatar Nov 18 '20 22:11 dhiltgen