holoinsight
holoinsight copied to clipboard
cadvisor init failure in kubernetes
Describe this problem
cadvisor's pod failed to run
[root@xxx ~]# kubectl get po -n holoinsight-example
NAME READY STATUS RESTARTS AGE
cadvisor-kpxbl 0/1 CrashLoopBackOff 3 (35s ago) 90s
cadvisor-zwc4q 0/1 CrashLoopBackOff 3 (16s ago) 90s
ceresdb-0 1/1 Running 0 91s
clusteragent-0 1/1 Running 0 91s
daemonagent-7xk4d 1/1 Running 0 90s
daemonagent-8n5gg 1/1 Running 0 90s
holoinsight-server-example-0 0/1 Running 0 91s
mongo-0 1/1 Running 0 91s
mysql-0 0/1 Running 0 90s
Viewing pod(cadvisor-kpxbl) logs:
[root@host-10-19-37-88 ~]# kubectl describe po cadvisor-kpxbl -n holoinsight-example
...
Containers:
cadvisor:
Container ID: docker://7a3b2aab591d147b4dbf9e804e7b1837817696e50cd540ce1f63aff1ca27dac1
Image: gcr.io/cadvisor/cadvisor:v0.44.0
Image ID: docker-pullable://gcr.io/cadvisor/cadvisor@sha256:ef1e224267584fc9cb8d189867f178598443c122d9068686f9c3898c735b711f
Port: 8080/TCP
Host Port: 0/TCP
Args:
--allow_dynamic_housekeeping=false
--housekeeping_interval=5s
--max_housekeeping_interval=5s
--storage_duration=2m
--enable_metrics=cpu,memory,network,tcp,disk,diskIO,cpuLoad
--enable_load_reader=true
--store_container_labels=false
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: ContainerCannotRun
Message: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/69dee914b194de362188cb07318446b62fa3559fc5cb03a54c1169e0cf4bda4c/merged/run/secrets: read-only file system: unknown
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m8s default-scheduler Successfully assigned holoinsight-example/cadvisor-kpxbl to host-10-19-37-88
Warning Failed 4m55s kubelet Error: failed to start container "cadvisor": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/ce6a7eebbaedec94eb7fdaf3a4f1427526613fb4cf7be485908299219d32ac4c/merged/run/secrets: read-only file system: unknown
Warning Failed 4m52s kubelet Error: failed to start container "cadvisor": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/944ffd264146660a5f9ede7638c677a1c47b98061838741cfb29bd3241c5babf/merged/run/secrets: read-only file system: unknown
Warning Failed 4m37s kubelet Error: failed to start container "cadvisor": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/f441c927545adff71fcbdc8c5056ebeaa2b441a112c321e8205b7bc2c5eadb0d/merged/run/secrets: read-only file system: unknown
Warning Failed 4m6s kubelet Error: failed to start container "cadvisor": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/bde54086f8560ea1fcbf9488fe25c83d8567d5bd0d9645ca5e59dd6c4940ffea/merged/run/secrets: read-only file system: unknown
Normal Pulled 3m21s (x5 over 5m1s) kubelet Container image "gcr.io/cadvisor/cadvisor:v0.44.0" already present on machine
Normal Created 3m20s (x5 over 5m) kubelet Created container cadvisor
Warning Failed 3m19s kubelet Error: failed to start container "cadvisor": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/8ba07a89c4755c857fbc1e11e48169dde0ac9c6a8aca0c4384c792d91e961f0a/merged/run/secrets: read-only file system: unknown
Warning BackOff 2m51s (x10 over 4m51s) kubelet Back-off restarting failed container
Steps to reproduce
kubernetes version:1.23 docker version: 20.10.6 linux kernal: 4.18.0-1.el7.elrepo.x86_64
Expected behavior
No response
Additional Information
No response
Is your k8s cluster a real cluster? Or a minikube version?
I can't find an environment exactly like yours to reproduce the problem in the short term. Maybe You can try to modify(e.g. comment out some configuration) the cadvisor.yaml and redeploy it.
I used kubeadm to boot the cluster
我把docker的运行数据的目录改了,不在/var/ 下面,是不是这个引起的
[root@]# more /etc/docker/daemon.json
{
"data-root": "/data/docker",
"exec-opts": [
"native.cgroupdriver=systemd"
]
}
Is the original value of data-root '/var/lib/docker' ? If so, maybe You need to change the cadvisor.yaml :
volumes:
...
- name: docker
hostPath:
path: /var/lib/docker
...
to
volumes:
...
- name: docker
hostPath:
path: /data/docker
...
change readOnly to false, run successfully
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cadvisor
namespace: holoinsight-example
spec:
selector:
matchLabels:
app: cadvisor
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
app: cadvisor
hi_common_version: '3'
spec:
restartPolicy: Always
containers:
- name: cadvisor
image: gcr.io/cadvisor/cadvisor:v0.44.0
args:
- --allow_dynamic_housekeeping=false
- --housekeeping_interval=5s
- --max_housekeeping_interval=5s
- --storage_duration=2m
- --enable_metrics=cpu,memory,network,tcp,disk,diskIO,cpuLoad
- --enable_load_reader=true
- --store_container_labels=false
volumeMounts:
- name: rootfs
mountPath: /rootfs
readOnly: false
- name: var-run
mountPath: /var/run
readOnly: false
- name: sys
mountPath: /sys
readOnly: true
- name: docker
mountPath: /var/lib/docker
readOnly: false
- name: disk
mountPath: /dev/disk
readOnly: true
ports:
- name: http
containerPort: 8080
protocol: TCP
resources:
requests:
cpu: "0"
memory: "0"
limits:
cpu: "0.25"
memory: "256Mi"
volumes:
- name: rootfs
hostPath:
path: /
- name: var-run
hostPath:
path: /var/run
- name: sys
hostPath:
path: /sys
- name: docker
hostPath:
path: /data/docker
- name: disk
hostPath:
path: /dev/disk
The volumeMounts
config in cadvisor yaml are copied from cadvisor official repository without any changes.
And our internal deployments (through Aliyun k8s cluster) are all successful with this cadvisor config.
I think there is some special particularity in your k8s cluster, leading to deployment failure.
If you would like to explore the root cause of this issue, and contribute a corresponding solution, then this is quite welcome.
dragonTour is not alone. I'm seeing this same issue in EKS 1.24 which uses containerd runtime.
cadvisor:
Container ID: containerd://80ad9ce8b85e077f50dd9c1bfd1e248801afa3126f94793b91bbdb5ea33acf29
Image: gcr.io/cadvisor/cadvisor:v0.49.1
Image ID: gcr.io/cadvisor/cadvisor@sha256:3cde6faf0791ebf7b41d6f8ae7145466fed712ea6f252c935294d2608b1af388
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: StartError
Message: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/kubelet/pods/882dfec1-613f-4a83-8705-424230f18271/volumes/kubernetes.io~projected/kube-api-access-phx22" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount": mkdir /run/containerd/io.containerd.runtime.v2.task/k8s.io/80ad9ce8b85e077f50dd9c1bfd1e248801afa3126f94793b91bbdb5ea33acf29/rootfs/run/secrets: read-only file system: unknown