cluster-api-provider-vsphere icon indicating copy to clipboard operation
cluster-api-provider-vsphere copied to clipboard

vsphere-csi-node-xxxxx are in CrashLoopBackOff

Open dattebayo6716 opened this issue 1 year ago • 7 comments

/kind bug

What steps did you take and what happened:

  • Setup a kind boostrap-cluster to create a 1-control-plane-node and 3-worker node cluster on my vSphere account.
  • I am using Ubuntu 22.04 OVA by VMWare.
  • On K apply I can see the VMs being created on my vSphere account.
  • I installed Calico as instructed using these instructions: https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises (Because the machines don't have full internet access from the onpremise environment)

What I see on the provisioned cluster

  1. Some calico pods are in pending state
  2. Some coredns pods are in pending state
  3. vsphere-csi-controller-manager pod is in pending state
  4. vsphere-csi-node-xxxxx are in CrashLoopBackOff without much information
  5. There is NO log of what error has occurred. I checked logs in CAPI and CAPV pods in the bootstrap cluster. There is NO error in the provisioned cluster's pods as well.

What did you expect to happen: I expected to see a cluster with all pods running.

Anything else you would like to add: Below are some of the K output for reference.

Here are some of the env variables I have

# VSPHERE_TEMPLATE: "ubuntu-2204-kube-v1.27.3"
# CONTROL_PLANE_ENDPOINT_IP: "10.63.32.100"
# VIP_NETWORK_INTERFACE: "ens192"
# VSPHERE_TLS_THUMBPRINT: ""
# EXP_CLUSTER_RESOURCE_SET: true  
# VSPHERE_SSH_AUTHORIZED_KEY: ""

# VSPHERE_STORAGE_POLICY: ""
# CPI_IMAGE_K8S_VERSION: "v1.27.3"

All bootstrap pods are running without errors.

ubuntu@frun10926:~/k8s$ kubectl get po -A -o wide
NAMESPACE                           NAME                                                             READY   STATUS    RESTARTS      AGE     IP            NODE                 NOMINATED NODE   READINESS GATES
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-557b778d6b-qpxn7       1/1     Running   1 (24h ago)   2d22h   10.244.0.9    kind-control-plane   <none>           <none>
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-55d8f6b576-8hl5r   1/1     Running   1 (24h ago)   2d22h   10.244.0.10   kind-control-plane   <none>           <none>
capi-system                         capi-controller-manager-685454967c-tnmcj                         1/1     Running   3 (24h ago)   2d22h   10.244.0.8    kind-control-plane   <none>           <none>
capv-system                         capv-controller-manager-84d85cdcbd-cb2wp                         1/1     Running   3 (24h ago)   2d22h   10.244.0.11   kind-control-plane   <none>           <none>
cert-manager                        cert-manager-75d57c8d4b-7j4tk                                    1/1     Running   1 (24h ago)   2d22h   10.244.0.6    kind-control-plane   <none>           <none>
cert-manager                        cert-manager-cainjector-69d6f4d488-rvp67                         1/1     Running   2 (24h ago)   2d22h   10.244.0.5    kind-control-plane   <none>           <none>
cert-manager                        cert-manager-webhook-869b6c65c4-h6xdt                            1/1     Running   0             2d22h   10.244.0.7    kind-control-plane   <none>           <none>
kube-system                         coredns-5d78c9869d-djj9s                                         1/1     Running   0             2d22h   10.244.0.4    kind-control-plane   <none>           <none>
kube-system                         coredns-5d78c9869d-vltjl                                         1/1     Running   0             2d22h   10.244.0.3    kind-control-plane   <none>           <none>
kube-system                         etcd-kind-control-plane                                          1/1     Running   0             2d22h   172.18.0.2    kind-control-plane   <none>           <none>
kube-system                         kindnet-zp6c5                                                    1/1     Running   1 (24h ago)   2d22h   172.18.0.2    kind-control-plane   <none>           <none>
kube-system                         kube-apiserver-kind-control-plane                                1/1     Running   1 (24h ago)   2d22h   172.18.0.2    kind-control-plane   <none>           <none>
kube-system                         kube-controller-manager-kind-control-plane                       1/1     Running   1 (24h ago)   2d22h   172.18.0.2    kind-control-plane   <none>           <none>
kube-system                         kube-proxy-t2g5b                                                 1/1     Running   0             2d22h   172.18.0.2    kind-control-plane   <none>           <none>
kube-system                         kube-scheduler-kind-control-plane                                1/1     Running   1 (24h ago)   2d22h   172.18.0.2    kind-control-plane   <none>           <none>
local-path-storage                  local-path-provisioner-6bc4bddd6b-rkwwm                          1/1     Running   0             2d22h   10.244.0.2    kind-control-plane   <none>           <none>

Here are the pods on the vSphere cluster that was provisioned using CAPI

ubuntu@frun10926:~/k8s$ kubectl get po -A --kubeconfig=mcluster.kubeconfig -o wide
NAMESPACE         NAME                                       READY   STATUS             RESTARTS          AGE     IP                NODE                        NOMINATED NODE   READINESS GATES
calico-system     calico-kube-controllers-5f9d445bb4-hp7rt   0/1     Pending            0                 2d20h   <none>            <none>                      <none>           <none>
calico-system     calico-node-6mrpv                          1/1     Running            0                 2d20h   10.63.32.83       mcluster-md-0-4kxmk-zplmd   <none>           <none>
calico-system     calico-node-dg42m                          1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
calico-system     calico-node-f6n9r                          1/1     Running            0                 2d20h   10.63.32.81       mcluster-md-0-4kxmk-wfscb   <none>           <none>
calico-system     calico-node-gtxcg                          1/1     Running            0                 2d20h   10.63.32.82       mcluster-md-0-4kxmk-gbcjj   <none>           <none>
calico-system     calico-typha-5b866db66c-sdnpv              1/1     Running            0                 2d20h   10.63.32.81       mcluster-md-0-4kxmk-wfscb   <none>           <none>
calico-system     calico-typha-5b866db66c-trwlj              1/1     Running            0                 2d20h   10.63.32.82       mcluster-md-0-4kxmk-gbcjj   <none>           <none>
calico-system     csi-node-driver-drblt                      2/2     Running            0                 2d20h   192.168.232.193   mcluster-klljm              <none>           <none>
calico-system     csi-node-driver-pbhvm                      2/2     Running            0                 2d20h   192.168.68.65     mcluster-md-0-4kxmk-zplmd   <none>           <none>
calico-system     csi-node-driver-vflj4                      2/2     Running            0                 2d20h   192.168.141.66    mcluster-md-0-4kxmk-gbcjj   <none>           <none>
calico-system     csi-node-driver-wzmtr                      2/2     Running            0                 2d20h   192.168.83.65     mcluster-md-0-4kxmk-wfscb   <none>           <none>
kube-system       coredns-5d78c9869d-ckdjb                   0/1     Pending            0                 2d20h   <none>            <none>                      <none>           <none>
kube-system       coredns-5d78c9869d-vlpkw                   0/1     Pending            0                 2d20h   <none>            <none>                      <none>           <none>
kube-system       etcd-mcluster-klljm                        1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       kube-apiserver-mcluster-klljm              1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       kube-controller-manager-mcluster-klljm     1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       kube-proxy-7dxb2                           1/1     Running            0                 2d20h   10.63.32.82       mcluster-md-0-4kxmk-gbcjj   <none>           <none>
kube-system       kube-proxy-gsgzz                           1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       kube-proxy-mp98t                           1/1     Running            0                 2d20h   10.63.32.83       mcluster-md-0-4kxmk-zplmd   <none>           <none>
kube-system       kube-proxy-x97w4                           1/1     Running            0                 2d20h   10.63.32.81       mcluster-md-0-4kxmk-wfscb   <none>           <none>
kube-system       kube-scheduler-mcluster-klljm              1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       kube-vip-mcluster-klljm                    1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       vsphere-cloud-controller-manager-hzvzj     1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       vsphere-csi-controller-664c45f69b-6ddz4    0/5     Pending            0                 2d20h   <none>            <none>                      <none>           <none>
kube-system       vsphere-csi-node-dtvrg                     2/3     CrashLoopBackOff   809 (3m57s ago)   2d20h   192.168.141.65    mcluster-md-0-4kxmk-gbcjj   <none>           <none>
kube-system       vsphere-csi-node-jcpxj                     2/3     CrashLoopBackOff   810 (73s ago)     2d20h   192.168.232.194   mcluster-klljm              <none>           <none>
kube-system       vsphere-csi-node-lpjxj                     2/3     CrashLoopBackOff   809 (2m22s ago)   2d20h   192.168.83.66     mcluster-md-0-4kxmk-wfscb   <none>           <none>
kube-system       vsphere-csi-node-nkh6m                     2/3     CrashLoopBackOff   809 (3m35s ago)   2d20h   192.168.68.66     mcluster-md-0-4kxmk-zplmd   <none>           <none>
tigera-operator   tigera-operator-84cf9b6dbb-w6lkf           1/1     Running            0                 2d20h   10.63.32.83       mcluster-md-0-4kxmk-zplmd   <none>           <none>

Here is a sample k describe for vsphere-csi-node-xxxx

ubuntu@frun10926:~/k8s$ kubectl describe pod  vsphere-csi-node-dtvrg -n kube-system --kubeconfig=mcluster.kubeconfig
Name:             vsphere-csi-node-dtvrg
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             mcluster-md-0-4kxmk-gbcjj/10.63.32.82
Start Time:       Fri, 24 Nov 2023 19:14:52 +0000
Labels:           app=vsphere-csi-node
                  controller-revision-hash=69967bd89d
                  pod-template-generation=1
                  role=vsphere-csi
Annotations:      cni.projectcalico.org/containerID: 0e30215c3f275ce821e98584c24cd139273c8c061af590ef5ddeb915b421e6ec
                  cni.projectcalico.org/podIP: 192.168.141.65/32
                  cni.projectcalico.org/podIPs: 192.168.141.65/32
Status:           Running
IP:               192.168.141.65
IPs:
  IP:           192.168.141.65
Controlled By:  DaemonSet/vsphere-csi-node
Containers:
  node-driver-registrar:
    Container ID:  containerd://075a9e6aa183294562e6edfbd55577f8eeca891c19cb43603973a1057d2f8125
    Image:         quay.io/k8scsi/csi-node-driver-registrar:v2.0.1
    Image ID:      quay.io/k8scsi/csi-node-driver-registrar@sha256:a104f0f0ec5fdd007a4a85ffad95a93cfb73dd7e86296d3cc7846fde505248d3
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=$(ADDRESS)
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    State:          Running
      Started:      Fri, 24 Nov 2023 19:31:30 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      ADDRESS:               /csi/csi.sock
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/csi.vsphere.vmware.com/csi.sock
    Mounts:
      /csi from plugin-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-glb6m (ro)
  vsphere-csi-node:
    Container ID:   containerd://b8ec60cc34ad576e31564f0d993b2b50440f8de2753f744c545cb772407ee654
    Image:          gcr.io/cloud-provider-vsphere/csi/release/driver:v3.1.2
    Image ID:       gcr.io/cloud-provider-vsphere/csi/release/driver@sha256:471db9143b6daf2abdb656383f9d7ad34123a22c163c3f0e62dc8921048566bb
    Port:           9808/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 27 Nov 2023 15:56:46 +0000
      Finished:     Mon, 27 Nov 2023 15:56:46 +0000
    Ready:          False
    Restart Count:  807
    Liveness:       http-get http://:healthz/healthz delay=10s timeout=3s period=5s #success=1 #failure=3
    Environment:
      CSI_ENDPOINT:               unix:///csi/csi.sock
      X_CSI_MODE:                 node
      X_CSI_SPEC_REQ_VALIDATION:  false
      VSPHERE_CSI_CONFIG:         /etc/cloud/csi-vsphere.conf
      LOGGER_LEVEL:               PRODUCTION
      X_CSI_LOG_LEVEL:            INFO
      NODE_NAME:                   (v1:spec.nodeName)
    Mounts:
      /csi from plugin-dir (rw)
      /dev from device-dir (rw)
      /etc/cloud from vsphere-config-volume (rw)
      /var/lib/kubelet from pods-mount-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-glb6m (ro)
  liveness-probe:
    Container ID:  containerd://3ccf0d77472d57ac853a20305fd7862c97163b2509e40977cdc735e26b21665a
    Image:         quay.io/k8scsi/livenessprobe:v2.1.0
    Image ID:      quay.io/k8scsi/livenessprobe@sha256:04a9c4a49de1bd83d21e962122da2ac768f356119fb384660aa33d93183996c3
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=/csi/csi.sock
    State:          Running
      Started:      Fri, 24 Nov 2023 19:31:54 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /csi from plugin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-glb6m (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  vsphere-config-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  csi-vsphere-config
    Optional:    false
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry
    HostPathType:  Directory
  plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/csi.vsphere.vmware.com/
    HostPathType:  DirectoryOrCreate
  pods-mount-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet
    HostPathType:  Directory
  device-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  kube-api-access-glb6m:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 :NoSchedule op=Exists
                             :NoExecute op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason            Age                      From     Message
  ----     ------            ----                     ----     -------
  Warning  DNSConfigForming  28s (x20490 over 2d20h)  kubelet  Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.242.46.35 10.242.46.36 10.250.46.36

Environment:

  • Cluster-api-provider-vsphere version: 1.5.3
  • Kubernetes version: (use kubectl version): 1.27.3
  • OS (e.g. from /etc/os-release): Ubuntu 22.04 OVA image that vSphere recommends (with no changes to the OVA).

dattebayo6716 avatar Nov 27 '23 19:11 dattebayo6716