cloud-provider-vsphere icon indicating copy to clipboard operation
cloud-provider-vsphere copied to clipboard

Chart not deploying any Pods

Open dkirrane opened this issue 1 year ago • 1 comments

What happened?

My cluster is deployed with kubeadm and cloud-provider: external is set.

I can see the coredns pods are in Pending

kube-system    coredns-74ff55c5b-kgw6x                   0/1     Pending   0          10m
kube-system    coredns-74ff55c5b-nh9jx                   0/1     Pending   0          10m

And Nodes are tainted

NodeName          TaintKey                                         TaintValue   TaintEffect
my-master-1       node.cloudprovider.kubernetes.io/uninitialized   true         NoSchedule
my-minion-1       node.cloudprovider.kubernetes.io/uninitialized   true         NoSchedule

After deploying the Helm chart I can see the Configmap

# Global properties in this section will be used for all specified vCenters unless overriden in VirtualCenter section.
global:
  port: 443
  # set insecure-flag to true if the vCenter uses a self-signed cert
  insecureFlag: true
  # settings for using k8s secret
  secretName: vsphere-cloud-secret
  secretNamespace: kube-system

# vcenter section
vcenter:
  my.company.com:
    server: my.company.com
    user: [email protected]
    password: myPassword123!
    datacenters:
      - MYDC

# labels for regions and zones
labels:
  region: k8s-region
  zone: k8s-zone

A describe on the DaemonSet does not show any issues However, no Pods come up and the Nodes stay tainted

kubectl -n kube-system get DaemonSet vsphere-cpi
NAME          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                AGE
vsphere-cpi   0         0         0       0            0           node-role.kubernetes.io/control-plane=true   5m7s

Helm List & Status also looks good

helm list -n kube-system
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
vsphere-cpi     kube-system     1               2022-10-12 14:30:50.768007456 +0000 UTC deployed        vsphere-cpi-1.24.2      1.24.2

helm -n kube-system status vsphere-cpi
NAME: vsphere-cpi
LAST DEPLOYED: Wed Oct 12 14:30:50 2022
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing vsphere-cpi.

Your release is named vsphere-cpi.

To learn more about the release, try:

  $ helm status vsphere-cpi
  $ helm get all vsphere-cpi

What else can I check to debug this issue?

What did you expect to happen?

vsphere-cloud-controller-manager Pods to come up

How can we reproduce it (as minimally and precisely as possible)?

Followed steps as per docs https://cloud-provider-vsphere.sigs.k8s.io/tutorials/kubernetes-on-vsphere-with-helm.html

Anything else we need to know (please consider providing level 4 or above logs of CPI)?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.15", GitCommit:"8f1e5bf0b9729a899b8df86249b56e2c74aebc55", GitTreeState:"clean", BuildDate:"2022-01-19T17:27:39Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.15", GitCommit:"8f1e5bf0b9729a899b8df86249b56e2c74aebc55", GitTreeState:"clean", BuildDate:"2022-01-19T17:23:01Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.15", GitCommit:"8f1e5bf0b9729a899b8df86249b56e2c74aebc55", GitTreeState:"clean", BuildDate:"2022-01-19T17:26:37Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration

VMware vSphere 7.0.3.00700

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

$ uname -a
Linux master-1 5.15.0-50-generic #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux



### Kernel (e.g. `uname -a`)

<details>

</details>


### Install tools

<details>

</details>


### Container runtime (CRI) and and version (if applicable)

<details>

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>


### Others

<details>

</details>

dkirrane avatar Oct 12 '22 15:10 dkirrane

Found that the issue is the nodeSelector on the charts Daemonset

node-role.kubernetes.io/control-plane: "true"

whereas the node created by kubeadm init has this label

node-role.kubernetes.io/control-plane: ''

When I remove true from the Daemonset nodeSelector the Pod comes up

dkirrane avatar Oct 12 '22 16:10 dkirrane

Yeah the problem is rancher is using this label node-role.kubernetes.io/control-plane: "true" while kubeadm is using node-role.kubernetes.io/control-plane: '' You can change the nodeSelector when you deploy chart by modifying the nodeSelector in values.yaml

I'll change the node selector to nodeAffinity instead

     #! use affinity instead of node selector since node selector only accepts single value
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/control-plane
                operator: Exists
            - matchExpressions:
              - key: node-role.kubernetes.io/master
                operator: Exists

lubronzhan avatar Oct 18 '22 22:10 lubronzhan

The newest 1.25.0 chart use affinity rule instead of node selector, so the issue should be gone

lubronzhan avatar Oct 20 '22 22:10 lubronzhan