cluster-api-provider-vsphere
cluster-api-provider-vsphere copied to clipboard
vsphere-csi-node-xxxxx are in CrashLoopBackOff
/kind bug
What steps did you take and what happened:
- Setup a kind boostrap-cluster to create a 1-control-plane-node and 3-worker node cluster on my vSphere account.
- I am using Ubuntu 22.04 OVA by VMWare.
- On
K apply
I can see the VMs being created on my vSphere account. - I installed Calico as instructed using these instructions: https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises (Because the machines don't have full internet access from the onpremise environment)
What I see on the provisioned cluster
- Some calico pods are in pending state
- Some coredns pods are in pending state
- vsphere-csi-controller-manager pod is in pending state
- vsphere-csi-node-xxxxx are in CrashLoopBackOff without much information
- There is NO log of what error has occurred. I checked logs in CAPI and CAPV pods in the bootstrap cluster. There is NO error in the provisioned cluster's pods as well.
What did you expect to happen: I expected to see a cluster with all pods running.
Anything else you would like to add:
Below are some of the K
output for reference.
Here are some of the env variables I have
# VSPHERE_TEMPLATE: "ubuntu-2204-kube-v1.27.3"
# CONTROL_PLANE_ENDPOINT_IP: "10.63.32.100"
# VIP_NETWORK_INTERFACE: "ens192"
# VSPHERE_TLS_THUMBPRINT: ""
# EXP_CLUSTER_RESOURCE_SET: true
# VSPHERE_SSH_AUTHORIZED_KEY: ""
# VSPHERE_STORAGE_POLICY: ""
# CPI_IMAGE_K8S_VERSION: "v1.27.3"
All bootstrap pods are running without errors.
ubuntu@frun10926:~/k8s$ kubectl get po -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-557b778d6b-qpxn7 1/1 Running 1 (24h ago) 2d22h 10.244.0.9 kind-control-plane <none> <none>
capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-55d8f6b576-8hl5r 1/1 Running 1 (24h ago) 2d22h 10.244.0.10 kind-control-plane <none> <none>
capi-system capi-controller-manager-685454967c-tnmcj 1/1 Running 3 (24h ago) 2d22h 10.244.0.8 kind-control-plane <none> <none>
capv-system capv-controller-manager-84d85cdcbd-cb2wp 1/1 Running 3 (24h ago) 2d22h 10.244.0.11 kind-control-plane <none> <none>
cert-manager cert-manager-75d57c8d4b-7j4tk 1/1 Running 1 (24h ago) 2d22h 10.244.0.6 kind-control-plane <none> <none>
cert-manager cert-manager-cainjector-69d6f4d488-rvp67 1/1 Running 2 (24h ago) 2d22h 10.244.0.5 kind-control-plane <none> <none>
cert-manager cert-manager-webhook-869b6c65c4-h6xdt 1/1 Running 0 2d22h 10.244.0.7 kind-control-plane <none> <none>
kube-system coredns-5d78c9869d-djj9s 1/1 Running 0 2d22h 10.244.0.4 kind-control-plane <none> <none>
kube-system coredns-5d78c9869d-vltjl 1/1 Running 0 2d22h 10.244.0.3 kind-control-plane <none> <none>
kube-system etcd-kind-control-plane 1/1 Running 0 2d22h 172.18.0.2 kind-control-plane <none> <none>
kube-system kindnet-zp6c5 1/1 Running 1 (24h ago) 2d22h 172.18.0.2 kind-control-plane <none> <none>
kube-system kube-apiserver-kind-control-plane 1/1 Running 1 (24h ago) 2d22h 172.18.0.2 kind-control-plane <none> <none>
kube-system kube-controller-manager-kind-control-plane 1/1 Running 1 (24h ago) 2d22h 172.18.0.2 kind-control-plane <none> <none>
kube-system kube-proxy-t2g5b 1/1 Running 0 2d22h 172.18.0.2 kind-control-plane <none> <none>
kube-system kube-scheduler-kind-control-plane 1/1 Running 1 (24h ago) 2d22h 172.18.0.2 kind-control-plane <none> <none>
local-path-storage local-path-provisioner-6bc4bddd6b-rkwwm 1/1 Running 0 2d22h 10.244.0.2 kind-control-plane <none> <none>
Here are the pods on the vSphere cluster that was provisioned using CAPI
ubuntu@frun10926:~/k8s$ kubectl get po -A --kubeconfig=mcluster.kubeconfig -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-system calico-kube-controllers-5f9d445bb4-hp7rt 0/1 Pending 0 2d20h <none> <none> <none> <none>
calico-system calico-node-6mrpv 1/1 Running 0 2d20h 10.63.32.83 mcluster-md-0-4kxmk-zplmd <none> <none>
calico-system calico-node-dg42m 1/1 Running 0 2d20h 10.63.32.84 mcluster-klljm <none> <none>
calico-system calico-node-f6n9r 1/1 Running 0 2d20h 10.63.32.81 mcluster-md-0-4kxmk-wfscb <none> <none>
calico-system calico-node-gtxcg 1/1 Running 0 2d20h 10.63.32.82 mcluster-md-0-4kxmk-gbcjj <none> <none>
calico-system calico-typha-5b866db66c-sdnpv 1/1 Running 0 2d20h 10.63.32.81 mcluster-md-0-4kxmk-wfscb <none> <none>
calico-system calico-typha-5b866db66c-trwlj 1/1 Running 0 2d20h 10.63.32.82 mcluster-md-0-4kxmk-gbcjj <none> <none>
calico-system csi-node-driver-drblt 2/2 Running 0 2d20h 192.168.232.193 mcluster-klljm <none> <none>
calico-system csi-node-driver-pbhvm 2/2 Running 0 2d20h 192.168.68.65 mcluster-md-0-4kxmk-zplmd <none> <none>
calico-system csi-node-driver-vflj4 2/2 Running 0 2d20h 192.168.141.66 mcluster-md-0-4kxmk-gbcjj <none> <none>
calico-system csi-node-driver-wzmtr 2/2 Running 0 2d20h 192.168.83.65 mcluster-md-0-4kxmk-wfscb <none> <none>
kube-system coredns-5d78c9869d-ckdjb 0/1 Pending 0 2d20h <none> <none> <none> <none>
kube-system coredns-5d78c9869d-vlpkw 0/1 Pending 0 2d20h <none> <none> <none> <none>
kube-system etcd-mcluster-klljm 1/1 Running 0 2d20h 10.63.32.84 mcluster-klljm <none> <none>
kube-system kube-apiserver-mcluster-klljm 1/1 Running 0 2d20h 10.63.32.84 mcluster-klljm <none> <none>
kube-system kube-controller-manager-mcluster-klljm 1/1 Running 0 2d20h 10.63.32.84 mcluster-klljm <none> <none>
kube-system kube-proxy-7dxb2 1/1 Running 0 2d20h 10.63.32.82 mcluster-md-0-4kxmk-gbcjj <none> <none>
kube-system kube-proxy-gsgzz 1/1 Running 0 2d20h 10.63.32.84 mcluster-klljm <none> <none>
kube-system kube-proxy-mp98t 1/1 Running 0 2d20h 10.63.32.83 mcluster-md-0-4kxmk-zplmd <none> <none>
kube-system kube-proxy-x97w4 1/1 Running 0 2d20h 10.63.32.81 mcluster-md-0-4kxmk-wfscb <none> <none>
kube-system kube-scheduler-mcluster-klljm 1/1 Running 0 2d20h 10.63.32.84 mcluster-klljm <none> <none>
kube-system kube-vip-mcluster-klljm 1/1 Running 0 2d20h 10.63.32.84 mcluster-klljm <none> <none>
kube-system vsphere-cloud-controller-manager-hzvzj 1/1 Running 0 2d20h 10.63.32.84 mcluster-klljm <none> <none>
kube-system vsphere-csi-controller-664c45f69b-6ddz4 0/5 Pending 0 2d20h <none> <none> <none> <none>
kube-system vsphere-csi-node-dtvrg 2/3 CrashLoopBackOff 809 (3m57s ago) 2d20h 192.168.141.65 mcluster-md-0-4kxmk-gbcjj <none> <none>
kube-system vsphere-csi-node-jcpxj 2/3 CrashLoopBackOff 810 (73s ago) 2d20h 192.168.232.194 mcluster-klljm <none> <none>
kube-system vsphere-csi-node-lpjxj 2/3 CrashLoopBackOff 809 (2m22s ago) 2d20h 192.168.83.66 mcluster-md-0-4kxmk-wfscb <none> <none>
kube-system vsphere-csi-node-nkh6m 2/3 CrashLoopBackOff 809 (3m35s ago) 2d20h 192.168.68.66 mcluster-md-0-4kxmk-zplmd <none> <none>
tigera-operator tigera-operator-84cf9b6dbb-w6lkf 1/1 Running 0 2d20h 10.63.32.83 mcluster-md-0-4kxmk-zplmd <none> <none>
Here is a sample k describe
for vsphere-csi-node-xxxx
ubuntu@frun10926:~/k8s$ kubectl describe pod vsphere-csi-node-dtvrg -n kube-system --kubeconfig=mcluster.kubeconfig
Name: vsphere-csi-node-dtvrg
Namespace: kube-system
Priority: 0
Service Account: default
Node: mcluster-md-0-4kxmk-gbcjj/10.63.32.82
Start Time: Fri, 24 Nov 2023 19:14:52 +0000
Labels: app=vsphere-csi-node
controller-revision-hash=69967bd89d
pod-template-generation=1
role=vsphere-csi
Annotations: cni.projectcalico.org/containerID: 0e30215c3f275ce821e98584c24cd139273c8c061af590ef5ddeb915b421e6ec
cni.projectcalico.org/podIP: 192.168.141.65/32
cni.projectcalico.org/podIPs: 192.168.141.65/32
Status: Running
IP: 192.168.141.65
IPs:
IP: 192.168.141.65
Controlled By: DaemonSet/vsphere-csi-node
Containers:
node-driver-registrar:
Container ID: containerd://075a9e6aa183294562e6edfbd55577f8eeca891c19cb43603973a1057d2f8125
Image: quay.io/k8scsi/csi-node-driver-registrar:v2.0.1
Image ID: quay.io/k8scsi/csi-node-driver-registrar@sha256:a104f0f0ec5fdd007a4a85ffad95a93cfb73dd7e86296d3cc7846fde505248d3
Port: <none>
Host Port: <none>
Args:
--v=5
--csi-address=$(ADDRESS)
--kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
State: Running
Started: Fri, 24 Nov 2023 19:31:30 +0000
Ready: True
Restart Count: 0
Environment:
ADDRESS: /csi/csi.sock
DRIVER_REG_SOCK_PATH: /var/lib/kubelet/plugins/csi.vsphere.vmware.com/csi.sock
Mounts:
/csi from plugin-dir (rw)
/registration from registration-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-glb6m (ro)
vsphere-csi-node:
Container ID: containerd://b8ec60cc34ad576e31564f0d993b2b50440f8de2753f744c545cb772407ee654
Image: gcr.io/cloud-provider-vsphere/csi/release/driver:v3.1.2
Image ID: gcr.io/cloud-provider-vsphere/csi/release/driver@sha256:471db9143b6daf2abdb656383f9d7ad34123a22c163c3f0e62dc8921048566bb
Port: 9808/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 27 Nov 2023 15:56:46 +0000
Finished: Mon, 27 Nov 2023 15:56:46 +0000
Ready: False
Restart Count: 807
Liveness: http-get http://:healthz/healthz delay=10s timeout=3s period=5s #success=1 #failure=3
Environment:
CSI_ENDPOINT: unix:///csi/csi.sock
X_CSI_MODE: node
X_CSI_SPEC_REQ_VALIDATION: false
VSPHERE_CSI_CONFIG: /etc/cloud/csi-vsphere.conf
LOGGER_LEVEL: PRODUCTION
X_CSI_LOG_LEVEL: INFO
NODE_NAME: (v1:spec.nodeName)
Mounts:
/csi from plugin-dir (rw)
/dev from device-dir (rw)
/etc/cloud from vsphere-config-volume (rw)
/var/lib/kubelet from pods-mount-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-glb6m (ro)
liveness-probe:
Container ID: containerd://3ccf0d77472d57ac853a20305fd7862c97163b2509e40977cdc735e26b21665a
Image: quay.io/k8scsi/livenessprobe:v2.1.0
Image ID: quay.io/k8scsi/livenessprobe@sha256:04a9c4a49de1bd83d21e962122da2ac768f356119fb384660aa33d93183996c3
Port: <none>
Host Port: <none>
Args:
--csi-address=/csi/csi.sock
State: Running
Started: Fri, 24 Nov 2023 19:31:54 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/csi from plugin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-glb6m (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
vsphere-config-volume:
Type: Secret (a volume populated by a Secret)
SecretName: csi-vsphere-config
Optional: false
registration-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/plugins_registry
HostPathType: Directory
plugin-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/plugins/csi.vsphere.vmware.com/
HostPathType: DirectoryOrCreate
pods-mount-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet
HostPathType: Directory
device-dir:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType:
kube-api-access-glb6m:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: :NoSchedule op=Exists
:NoExecute op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning DNSConfigForming 28s (x20490 over 2d20h) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.242.46.35 10.242.46.36 10.250.46.36
Environment:
- Cluster-api-provider-vsphere version: 1.5.3
- Kubernetes version: (use
kubectl version
): 1.27.3 - OS (e.g. from
/etc/os-release
): Ubuntu 22.04 OVA image that vSphere recommends (with no changes to the OVA).