kind
kind copied to clipboard
document how to run kind in a kubernetes pod
NOTE: We do NOT recommend doing this if it is at all avoidable. We don't have another option so we do it ourselves, but it has many footguns.
xref: #284 additionally these mounts are known to be needed:
volumeMounts:
# not strictly necessary in all cases
- mountPath: /lib/modules
name: modules
readOnly: true
- mountPath: /sys/fs/cgroup
name: cgroup
volumes:
- name: modules
hostPath:
path: /lib/modules
type: Directory
- name: cgroup
hostPath:
path: /sys/fs/cgroup
type: Directory
thanks to @maratoid
/kind documentation /priority important-longterm
We probably need a new page in the user guide for this.
EDIT: Additionally, for any docker in docker usage the docker storage (typically /var/lib/docker
) should be a volume. A lot of attempts at using kind in Kubernetes seem to miss this one. Typically an emptyDir
is suitable for this.
EDIT2: you also probably want to set a pod DNS config to some upstream resolvers so as not to have your inner cluster pods trying to talk to the outer cluster's DNS which is probably on a clusterIP and not necessarily reachable.
dnsPolicy: "None"
dnsConfig:
nameservers:
- 1.1.1.1
- 1.0.0.1
EDIT3: Loop devices are not namespaced, follow from #1248 to find our current workaround
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
this came up again in #677 and again today in another deployment /assign
see this about possibly inotify watch limits on the host and a work around https://github.com/kubernetes-sigs/kind/issues/717#issuecomment-513070836
this issue may also apply to other linux hosts (non-kubernetes)
For future reference, here's a working pod spec for running kind
in a pod:
(Add your own image)
(cc @BenTheElder - is this a sane pod spec for kind
?)
That being said, there should also be documentation for:
-
why
kind
needs the volume mounts and what impact they have on the underlying node infrastructure - what happens when the pod is terminated before deleting the cluster (in the context of https://github.com/kubernetes-sigs/kind/issues/658#issuecomment-505704699)
- configuring garbage collection for unused image to avoid node disk pressure (https://github.com/kubernetes-sigs/kind/pull/663)
- anything else?
apiVersion: v1
kind: Pod
metadata:
name: dind-k8s
spec:
containers:
- name: dind
image: <image>
securityContext:
privileged: true
volumeMounts:
- mountPath: /lib/modules
name: modules
readOnly: true
- mountPath: /sys/fs/cgroup
name: cgroup
- name: dind-storage
mountPath: /var/lib/docker
volumes:
- name: modules
hostPath:
path: /lib/modules
type: Directory
- name: cgroup
hostPath:
path: /sys/fs/cgroup
type: Directory
- name: dind-storage
emptyDir: {}
Make sure you do kind delete cluster
! See https://github.com/kubernetes-sigs/kind/issues/759
That's pretty sane. As @howardjohn notes please make sure you clean up the top level containers in that pod (IE kind delete cluster
in an exit trap or similar). DNS may also give you issues.
why kind needs the volume mounts and what impact they have on the underlying node infrastructure
- /lib/modules is not strictly necessary, but a number of things want to probe these contents, and it's harmless to mount them. For clarity I would make this mount read-only. No impact.
- cgroups are mounted because cgroupsv1 containers don't exactly nest. if we were just doing docker in docker we wouldn't need this.
what happens when the pod is terminated before deleting the cluster (in the context of #658 (comment))
It depends on your setup, with these mounts IIRC the processes / containers can leak. Don't do this. Have an exit handler, deleting the containers should happen within the grace period.
configuring garbage collection for unused image to avoid node disk pressure (#663)
You shouldn't need this in CI, kind clusters should be ephemeral. Please, please use them ephemerally. There are a number of ways kind is not optimized for production long lived clusters. For temporary clusters used during a test this is a non-issue.
Also note that turning on disk eviction risks your pods being evicted based on the disk usage of the host. There's a reason this is off by default. Eventually we will ship an alternative to make long lived clusters better, but for now it's best to not depend on long lived clusters or image GC.
anything else?
DNS (see above). Your outer cluster's in-cluster DNS servers are typically on a clusterIP which won't necessarily be visible to the containers in the inner cluster. Ideally configure the "host machine" Pod's DNS to your preferred upstream DNS provider (see above).
@BenTheElder thank you for pointing me in this issue - I am trying to see how we would fit @radu-matei's example into the testing automation we are introducing for our kubernetes project. Right now we want to trigger the creation of the cluster and the commands within that cluster from within a pod. I've tried creating a container that has docker and kind installed.
I've tried creating a pod with the instructions provided above, but I still can't seem to run the kind create cluster
command provided - I get the error:
root@k8s-builder-7b5cc87566-fnz5b:/work# kind create cluster
Error: could not list clusters: failed to list nodes: exit status 1
For testing I am currently creating the container, running kubectl exec
into it and running kind create cluster
.
The current pod specification I have is the following:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: k8s-builder
spec:
replicas: 1
template:
metadata:
labels:
app: k8s-101
spec:
containers:
- name: k8s-docker-builder
image: seldonio/core-builder:0.4
imagePullPolicy: Always
command:
- tail
args:
- -f
- /dev/null
volumeMounts:
- mountPath: /lib/modules
name: modules
readOnly: true
- mountPath: /sys/fs/cgroup
name: cgroup
- name: dind-storage
mountPath: /var/lib/docker
securityContext:
privileged: true
volumes:
- name: modules
hostPath:
path: /lib/modules
type: Directory
- name: cgroup
hostPath:
path: /sys/fs/cgroup
type: Directory
- name: dind-storage
emptyDir: {}
For explicitness, the way that I am installing Kind in the Dockerfile is as follows:
# Installing KIND
RUN wget https://github.com/kubernetes-sigs/kind/releases/download/v0.5.1/kind-linux-amd64 && \
chmod +x kind-linux-amd64 && \
mv ./kind-linux-amd64 /bin/kind
For explicitness, the way that I am installing Kubectl in the Dockerfile is as follows:
# Installing Kubectl
RUN wget https://storage.googleapis.com/kubernetes-release/release/v1.16.2/bin/linux/amd64/kubectl && \
chmod +x ./kubectl && \
mv ./kubectl /bin
For explicitness, the way that I am installing Docker in the Dockerfile is as follows:
# install docker
RUN \
apt-get update && \
apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg2 \
software-properties-common && \
curl -fsSL https://download.docker.com/linux/$(. /etc/os-release; echo "$ID")/gpg | apt-key add - && \
add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") \
$(lsb_release -cs) \
stable" && \
apt-get update && \
apt-get install -y docker-ce
What should i make sure I take into account to make sure this works?
@axsaucedo can you verify that you started docker succesfully? failing to list clusters means docker ps
does not work.
That is correct, currently I am getting the usual Cannot connect to the Docker daemon at unix:///var/run/docker.sock.
. What woudl be the way to make it work in the pod without mounting the Node's socket? Is there a cleaner/better way to do this?
@BenTheElder I was able to successfully create a kind cluster by starting an internal docker service inside of the pod, which is a fantastic step forward, but I am not sure whether this is the internded use. I did have a look at the response you made in #997 where you pointed to wrapper.sh
which actually does start the service itself, so I assume that is the correct/expected usage?
For sake of explicitness here is the comment you provided in #997 (very useful): https://github.com/kubernetes-sigs/kind/issues/997#issuecomment-545102002
yes -- you need to start docker. for our CI image we handle this in the entrypoint
Note that we appeared to experience a leak in the istio CI, it is important that you ensure that on exit all containers are delete. kind delete cluster
should be sufficient, but we also recommend force removing all docker containers.
This image is WIP what kind's own Kubernetes based CI will be using. https://github.com/kubernetes/test-infra/tree/master/images/krte
Note this part https://github.com/kubernetes/test-infra/blob/4696b77f4ee7cfffe8e86a8b8e84c797d6846bfd/images/krte/wrapper.sh#L125
Thank you very much @BenTheElder - we have managed to successfully run our e2e tests using the krte
as base example, really appreciate the guidance (currently sitting at https://github.com/SeldonIO/seldon-core/pull/994). It's pretty mind blowing that it's now possible to run kubernetes in kubernetes to test kubernetes components in containerised kubernetes 🤯
For future reference, here's a working pod spec for running
kind
in a pod: (Add your own image) (cc @BenTheElder - is this a sane pod spec forkind
?)That being said, there should also be documentation for:
- why
kind
needs the volume mounts and what impact they have on the underlying node infrastructure- what happens when the pod is terminated before deleting the cluster (in the context of #658 (comment))
- configuring garbage collection for unused image to avoid node disk pressure (#663)
- anything else?
apiVersion: v1 kind: Pod metadata: name: dind-k8s spec: containers: - name: dind image: <image> securityContext: privileged: true volumeMounts: - mountPath: /lib/modules name: modules readOnly: true - mountPath: /sys/fs/cgroup name: cgroup - name: dind-storage mountPath: /var/lib/docker volumes: - name: modules hostPath: path: /lib/modules type: Directory - name: cgroup hostPath: path: /sys/fs/cgroup type: Directory - name: dind-storage emptyDir: {}
using this pod configuration I am still getting a error from containerd
:
Dec 11 04:31:22 kind-control-plane containerd[48]: time="2019-12-11T04:31:22.617994096Z" level=error msg="copy shim log" error="reading from a closed fifo"
Dec 11 04:31:22 kind-control-plane containerd[48]: time="2019-12-11T04:31:22.626806247Z" level=error msg="copy shim log" error="reading from a closed fifo"
Dec 11 04:31:22 kind-control-plane containerd[48]: time="2019-12-11T04:31:22.643253105Z" level=error msg="copy shim log" error="reading from a closed fifo"
Dec 11 04:31:22 kind-control-plane containerd[48]: time="2019-12-11T04:31:22.644244344Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-controller-manager-kind-control-plane,Uid:7a42efc8ddc98f327b58e75d0d6078b7,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: io.containerd.runc.v1: failed to adjust OOM score for shim: set shim OOM score: write /proc/368/oom_score_adj: invalid argument\n: exit status 1: unknown"
Dec 11 04:31:22 kind-control-plane containerd[48]: time="2019-12-11T04:31:22.645505766Z" level=error msg="copy shim log" error="reading from a closed fifo"
Dec 11 04:31:22 kind-control-plane containerd[48]: time="2019-12-11T04:31:22.646301955Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:etcd-kind-control-plane,Uid:051f0a138da15840d511b8f1d90c5bbf,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: io.containerd.runc.v1: failed to adjust OOM score for shim: set shim OOM score: write /proc/345/oom_score_adj: invalid argument\n: exit status 1: unknown"
This is the main issue also reported on kubelet
while trying to start kube-apiserver
Kind version: master Local:
Client: Docker Engine - Community
Version: 19.03.5
API version: 1.40
Go version: go1.12.12
Git commit: 633a0ea
Built: Wed Nov 13 07:22:34 2019
OS/Arch: darwin/amd64
Experimental: true
any help greatly appreciated
Hello,
I read the thread, and apply all you recommendations. I added the cluster deletion on my kind docker image.
Just to be sure, and add an other protection, (for eventual kill -9
) do you have a /sys/fs/cgroup
clean up script ? Not sure if it's technically possible, I guess no, but maybe I miss something.
we do not, we do not expect a kill -9 to occur ... we certainly do not do so manually, and that's not how kubernetes terminates pods normally AFAIK.
if it did though, our host machines are regularly replaced (k8s upgrades and node auto repair in GKE involve replacing the node VMs entirely)
however, this problem is one of the reasons I do NOT recommend doing this if you can avoid it. on a host like say circleCI, google cloud build, travis, ... you will not have this problem as the underlying VM only exists for the test invocation.
IF your CI infrastructure must kubernetes based (instead of just your app infra) privileged containers and kind can let you run kubernetes end to end tests, but it is not without issues.
For some reasons, docker can send a SIGKILL after the grace period.
I run kind on a CI (Jenkins) who run on GKE. It's not a big issue to lost jenkins while we wait the new pod on an other worker.
Thanks for your reply
To be clearer: we will have cleaned up before the grace period normally. We trap SIGTERM with cleanup.
On Tue, Feb 25, 2020, 13:29 quentin9696 [email protected] wrote:
For some reasons, docker can send a SIGKILL after the grace period.
I run kind on a CI (Jenkins) who run on GKE. It's not a big issue to lost jenkins while we wait the new pod on an other worker.
Thanks for your reply
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/kind/issues/303?email_source=notifications&email_token=AAHADK7ZDYUINJ745AACBILREWER3A5CNFSM4GXS3QMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5SIYY#issuecomment-591078499, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHADKYXGWQVEYQXF2TUAKTREWER3ANCNFSM4GXS3QMA .
We moved our cgroup mount to read only a few months back and haven't had any issues. It removed any risk of things not cleaning up properly (I think? We still clean up and now our nodes restart often for other reasons, so maybe it doesn't and I just don't notice)
An additional note for why this can be problematic: /dev/loop*
devices are NOT namespaced / are shared with the host. This is a problem if you're trying to do blockfs testing (like we do in kubernetes). AIUI minikube does not support block devices at all but, if for some reason you're trying to test block devices with local clusters, you're going to need to work around this by preallocating sufficient block devices.
https://github.com/kubernetes/test-infra/blob/dfe2d0f383c8f6df6cc2e53ca253d048e18dcfe2/prow/cluster/create-loop-devs_daemonset.yaml
FYI, wrote a blog post about this to share our experiences: https://d2iq.com/blog/running-kind-inside-a-kubernetes-cluster-for-continuous-integration
Could the Dockerfile for jieyu/kind-cluster-buster:v0.1.0
be open sourced by any chance?
@deiwin it’s open sourced. There’s a link in the blog post to the repo. https://github.com/jieyu/docker-images
Oh, didn't notice that. Thank you!
@jieyu thanks for spending the time to refactor into an OSS repo and write that comprehensible blog post! We also have a production cluster running with KIND in kubernetes, but we'll be looking to refactor it using some of your techniques. I have a question, how come in none of your scripts you actually delete the KIND cluster? From previous posts/implementations that Ben has covered one of the main emphasis is to ensure the KIND cluster is deleted, otherwise there may be dangling resources. In our implementation we do remove the KIND cluster, and then we run service docker stop
however it sometimes hangs, and we were thinking of just running the KIND delete without the service docker stop
, hence why I am also curious of your implementation. Thanks again!
@axsaucedo I believe that one of the main reasons that previous implementations require deleting the KIND cluster is to make sure cgroups are cleaned up (thus no leaked) on the host cgroup filesystem. The way we solved it is to place docker daemons's root cgroup nested underneath the corresponding pod cgroup (i.e., https://github.com/jieyu/docker-images/blob/master/dind/entrypoint.sh#L64). Thus, when the pod (with KIND running inside it) is terminated by Kubernetes, all the associated resources are cleaned up, including the cgroups used by the KIND cluster.
There might be other "shared" global kernel resources used by KIND that's not properly "nested" under the pod (e.g., devices), which means that they might get leaked if the KIND cluster is not cleaned up properly in the pod. However, we don't have such workload in our CI, thus no need to worry about those in our case.
Right I see @jieyu, that makes sense. Ok fair enough, that's quite a solid approach. We'll be looking to refactor our implementation using the approach you outlined in the blogpost as base. Thanks again for this!
There might be other "shared" global kernel resources used by KIND that's not properly "nested" under the pod (e.g., devices), which means that they might get leaked if the KIND cluster is not cleaned up properly in the pod. However, we don't have such workload in our CI, thus no need to worry about those in our case.
Right, in Kubernetes's CI (and others) this is not the case. I still strongly recommend at least best-effort attempting to shut things down gracefully. I also strongly suggest reconsidering trying to run Kubernetes inside Kubernetes for anything serious.