cert-exporter
cert-exporter copied to clipboard
Ceph rbd is not unmapped automatically. Need to add an option to change volume mount propagation (mountPropagation).
Hi there.
I've started to use cert-exporter and faced a problem: when I delete a pod, that use ceph rbd, that rbd is not unmapped/unmount automatically from k8s node and the pod cannot be scheduled on another node. Kubelet logs:
Aug 11 09:17:42 kube-01 d8-kubelet-forker[14693]: E0811 09:17:42.907060 14694 nestedpendingoperations.go:301]
Operation for "{volumeName:kubernetes.io/rbd/kube:kubernetes-dynamic-pvc-018409f2-e715-4452-b483-c011772acec9
podName: nodeName:}" failed. No retries permitted until 2021-08-11 09:17:43.407003409 +0300 MSK
m=+1698521.295086326 (durationBeforeRetry 500ms). Error: "UnmountDevice failed for volume \"pvc-323d06cc-947b-
4c64-aaa0-e2f8963d27e5\" (UniqueName: \"kubernetes.io/rbd/kube:kubernetes-dynamic-pvc-018409f2-e715-4452-b483-
c011772acec9\") on node \"kube-01\" : rbd: failed to unmap device /dev/rbd10, error exit status 16, rbd output: [114 98 100 58
32 115 121 115 102 115 32 119 114 105 116 101 32 102 97 105 108 101 100 10 114 98 100 58 32 117 110 109 97 112 32 102 97 105
108 101 100 58 32 40 49 54 41 32 68 101 118 105 99 101 32 111 114 32 114 101 115 111 117 114 99 101 32 98 117 115 121 10]"
...
Aug 11 09:17:43 kube-01 d8-kubelet-forker[14693]: E0811 09:17:43.429490 14694 nestedpendingoperations.go:301]
Operation for "{volumeName:kubernetes.io/rbd/kube:kubernetes-dynamic-pvc-018409f2-e715-4452-b483-c011772acec9
podName: nodeName:}" failed. No retries permitted until 2021-08-11 09:17:44.429433563 +0300 MSK
m=+1698522.317516538 (durationBeforeRetry 1s). Error: "UnmountDevice failed for volume \"pvc-323d06cc-947b-4c64-
aaa0-e2f8963d27e5\" (UniqueName: \"kubernetes.io/rbd/kube:kubernetes-dynamic-pvc-018409f2-e715-4452-b483-
c011772acec9\") on node \"kube-01\" : Unmount failed: exit status 32\nUnmounting arguments:
/var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-image-kubernetes-dynamic-pvc-018409f2-e715-4452-b483-
c011772acec9\nOutput: umount: /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-image-kubernetes-dynamic-pvc-
018409f2-e715-4452-b483-c011772acec9: not mounted\n\n"
After some research I found the root of this issue (thanks to this guy - https://cloud.tencent.com/developer/article/1469532). This is a cert-exporter pod running on the same k8s node.
cert-exporter pod mounts /var/lib/kubelet
and pods, that use ceph rbd, mounts over it to /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/blablabla
Related issue https://github.com/kubernetes/kubernetes/issues/54214. Related PR in prometheus-node-exporter https://github.com/helm/charts/pull/11194/files
The solution is to add an option to configure mountPropagation in daemonsets like that:
volumeMounts:
- mountPath: /var/lib/kubelet
mountPropagation: HostToContainer
name: kubelet
readOnly: true