kind
kind copied to clipboard
Cant create a kind cluster after delete cluster in a docker in docker vscode devcontainer
What happened: I am trying to create a kind cluster in a vscode devcontainer. I am working on windows with docker desktop and have been using a docker inside docker template.
When the container is first constructed I am able to create a cluster using kind create cluster from a terminal within the container and this works successful
However if i delete the cluster and try to create again it fails.
This doesn't happen when I repeat the process on the host windows machine, it will create every time.
This is to be used in a script so I need it to be repeatable, delete cluster then create cluster
Thanks in advance for any assistance
What you expected to happen: A new cluster is created
How to reproduce it (as minimally and precisely as possible):
- Create a new devcontainer in VSCode from
New Dev Container...
menu option - Create from Docker in Docker template
- Add features for kind, kubectl and node
- create devcontainer
- open a terminal
- enter command
kind create cluster
- enter command
kind delete cluster
- enter command
kind create cluster
Anything else we need to know?:
Environment: Windows 11 Docker Desktop 4.23.0 Dev Container Features
"ghcr.io/devcontainers/features/node:1": {}, "ghcr.io/mpriscella/features/kind:1": {}, "ghcr.io/devcontainers-contrib/features/kubectl-asdf:2": {} Docker Info from inside Dev Container:
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2
Path: /home/vscode/.docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: 2.21.0-1
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 23.0.6+azure-2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
runc version: ccaecfcbc907d70a7aa870a6650887b901b25b82
init version:
Security Options:
seccomp
Profile: builtin
Kernel Version: 5.10.102.1-microsoft-standard-WSL2
Operating System: Debian GNU/Linux 11 (bullseye) (containerized)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 15.5GiB
Name: 76da64a73ada
ID: ee13f67f-b2b0-4995-8883-dd3c59c7f619
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
I am trying to create a kind cluster in a vscode devcontainer. I am working on windows with docker desktop and have been using a docker inside docker template.
We don't recommend this and it may be a bug in the docker in docker environment.
Please avoid adding additional nesting, it's a real headache to debug.
@BenTheElder, I see that you replied this to a lot of similar issues lately, but I just want to say that using kind within an already containerized environment is a totally acceptable use case. Two important use cases:
- Dev containers. My team extensively leverages this technology to ensure repeatable dev environments that are uniform across all developers. In my team, we develop K8s operators in dev containers using kind.
- Containerized CI runners. This is really common to leverage containerized ephemeral runners to ensure repeatability of CI jobs as well as being able to easily scale those runners horizontally on K8s. In my team, we run integration tests in our CI using kind for our in-house operators.
I understand that this adds complexity on your end and makes debugging more difficult, but I just want to make sure you're aware of the valid use cases of running kind in containerized environments. Those use cases won't go away.
@KieranJeffreySmart, see https://github.com/kubernetes-sigs/kind/issues/3283#issuecomment-1745616607. TL;DR, you likely need to enable cgroup v2 on the VM on which Docker runs for kind v20+ to work properly.
I'm aware of the use cases, but we have limited bandwidth to provide supprt and it's available as a static go binary, you can't containerize docker itself either.
We'll happily review proposed fixes from contributors but I just cannot justify spending my own time debugging this versus steering people towards more debuggable alternatives.
Kind is already running containers in containers which is unfortunately insecure and error prone but similarly useful, I highly recommend avoid doing this again with another layer.
See also #303 for additional footguns running nested inside of another Kubernetes cluster.
For Windows specifically: #1529, nobody has contributed to work on CI for windows. aojea and I don't use windows for development, so we depend on community contributions to keep the WSL2 docs up to date and identify fixes for us to review or sometimes implement without being able to directly verify ourselves.
... let alone adding container nesting on Windows.
... let alone adding container nesting on Windows.
Quick note for the audience with no Windows exposure: containers/docker on Windows (except for actual Windows containers which nobody uses) runs in a Linux kernel and for the most part behaves the same as if it were running on a bare metal Linux box. Although it's convenient, you don't need to run Docker Desktop on Windows -- regular Linux docker, or podman will work fine inside WSL2. Therefore the issues with nesting containers are essentially the same as for stock Linux.
Therefore the issues with nesting containers are essentially the same as for stock Linux.
We tell people to avoid running kind in docker-in-docker on Linux. It's generally not necessary (it's no more secure than just passing the host dockerd socket, and more effort) and creates a lot of additional problems. There are some use cases where it makes sense, but adding another layer of nested containers is very "here be dragons".
Also the environment in WSL2 is different from Linux run elsewhere, e.g. it often has a custom init system, and we don't have easy access to reproduce and debug (or the time / inclination really, there's so much to do and OSS developers could use Linux and we don't use Windows ourselves, nor is it really supported for developing Kubernetes/Kubernetes https://kind.sigs.k8s.io/docs/contributing/project-scope/)
(Difference in the init, Kernel => different cgroups management => impact on containers)
init is out of scope here though, since we're running inside a container.
btw it turns out nested kind works just fine now, provided the container has the necessary secret sauce. The stock docker:dind container is an example of such a thing, albeit Alpine so...not for everyone. There is an Ubuntu equivalent image that also works: https://github.com/cruizba/ubuntu-dind
You can start that container, install kind (or k3d) and create a cluster. It can be used as an existence proof from which to generate your own image for CI and so on.
init is out of scope here though, since we're running inside a container.
It's not, the init is responsible for setting up cgroups amongst other things and we're sharing that along with the rest of the kernel from the host since we're using containers instead of VMs. Privileged containers like kind nodes are "leakier" than normal containers but all containers are influenced by the host's init.
Well, I've tested stock WSL2 on x86 and it works. I'll try aarm64 and report back...
Well, I've tested stock WSL2 on x86 and it works. I'll try aarm64 and report back...
Reporting back: ARM WSL2 doesn't work :(
Fwiw, the issue where kind delete cluster
followed by kind create cluster
fails running in dind (original problem reported above) occurs on regular x64 Ubuntu too (unrelated to WSL2).
This sort of problem is likely eliminated in cgroup v2+ cgroupns hosts and cgroup v1 is going into maintenance mode by Kubernetes https://github.com/kubernetes/enhancements/pull/4572 and deprecated soon by various ecosystem projects (like OCI, systemd)
On cgroup v1 we started forcing cgroupns=private on kind nodes which may help with some of these problems.