vsphere-csi-driver icon indicating copy to clipboard operation
vsphere-csi-driver copied to clipboard

failed to get CsiNodeTopology for the node: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.

Open hirendave47 opened this issue 2 years ago • 39 comments

/kind bug

What happened: Trying to install vSphere CSI drivers v2.7.0 with RKE2 cluster v1.24.10+rke2r1.

$ cat /etc/rancher/rke2/config.yaml cloud-provider-name: external

_$ cat csi-vsphere.conf [Global] cluster-id = "${CLUSTER_NAME}" cluster-distribution = "Kubernetes"

[VirtualCenter "172.16.16.110"] insecure-flag = "true" user = "[email protected]" password = "password12345" port = "443" datacenters = "datacenter1"_

_root@urnpk8sm60:~# kubectl --namespace=vmware-system-csi get all NAME READY STATUS RESTARTS AGE pod/vsphere-csi-controller-7589ccbcf8-6w7pw 0/7 Pending 0 3m2s pod/vsphere-csi-controller-7589ccbcf8-phl5c 0/7 Pending 0 3m2s pod/vsphere-csi-controller-7589ccbcf8-wwwfc 0/7 Pending 0 3m2s pod/vsphere-csi-node-6vljg 2/3 CrashLoopBackOff 4 (79s ago) 3m2s pod/vsphere-csi-node-dpnh9 2/3 CrashLoopBackOff 5 (7s ago) 3m2s pod/vsphere-csi-node-jd4wt 2/3 CrashLoopBackOff 4 (78s ago) 3m2s pod/vsphere-csi-node-wtlp7 2/3 CrashLoopBackOff 4 (72s ago) 3m2s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/vsphere-csi-controller ClusterIP 10.43.162.210 2112/TCP,2113/TCP 3m2s

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/vsphere-csi-node 4 4 0 4 0 kubernetes.io/os=linux 3m2s daemonset.apps/vsphere-csi-node-windows 0 0 0 0 0 kubernetes.io/os=windows 3m2s

NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/vsphere-csi-controller 0/3 3 0 3m2s

NAME DESIRED CURRENT READY AGE replicaset.apps/vsphere-csi-controller-7589ccbcf8 3 3 0 3m2s root@urnpk8sm60:~#_ root@urnpk8sm60:~# kubectl --namespace=vmware-system-csi logs pod/vsphere-csi-node-wtlp7 Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe I0315 11:27:48.852737 1 main.go:166] Version: v2.5.1 I0315 11:27:48.852835 1 main.go:167] Running node-driver-registrar in mode=registration I0315 11:27:48.854993 1 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock" I0315 11:27:48.855119 1 connection.go:154] Connecting to unix:///csi/csi.sock I0315 11:27:48.859495 1 main.go:198] Calling CSI driver to discover driver name I0315 11:27:48.859554 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo I0315 11:27:48.859566 1 connection.go:184] GRPC request: {} I0315 11:27:48.875719 1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v2.7.0"} I0315 11:27:48.876170 1 connection.go:187] GRPC error: <nil> I0315 11:27:48.876774 1 main.go:208] CSI driver name: "csi.vsphere.vmware.com" I0315 11:27:48.877323 1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock I0315 11:27:48.878695 1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock I0315 11:27:48.879412 1 node_register.go:92] Skipping HTTP server because endpoint is set to: "" I0315 11:27:49.996391 1 main.go:102] Received GetInfo call: &InfoRequest{} I0315 11:27:49.998477 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration" I0315 11:27:50.069958 1 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "urnpk8sm60". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",} E0315 11:27:50.070058 1 main.go:122] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "urnpk8sm60". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container. root@urnpk8sm60:~#

After changing improved-volume-topology: 'true' to false in vsphere-csi-driver.yaml, pod/vsphere-csi-node are running but pod/vsphere-csi-controller are still in Pending state due to node affinity/selector.

Warning FailedScheduling 26s default-scheduler 0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.

What you expected to happen: Same steps are working fine with vanilla Kubernetes but not working with RKE2.

Environment:

  • csi-vsphere version: v2.7.0
  • vsphere-cloud-controller-manager version: 1.24
  • Kubernetes version: v1.24.10+rke2r1
  • vSphere version: 7.0.3
  • OS (e.g. from /etc/os-release): Ubuntu 22.04
  • Kernel (e.g. uname -a): 5.15.0-60-generic
  • Install tools: NA
  • Others: NA

hirendave47 avatar Mar 15 '23 12:03 hirendave47

We hit similar bugs with vSphere csi driver 3.0. More details as follows.

Starting from 3.0, cns topology feature flag is removed, and cannot be turned off OSS change. By comparing the passed log and failed logs, looks like there might be race condition happened during node registration. And the logic does not has a retry so the the following logic depending on it will fail.

vsphere-csi-node-mbdjf                                       1/2     CrashLoopBackOff 

Node driver registry log

2023-03-31T12:28:22.245291043Z I0331 12:28:22.245133       1 main.go:102] Received GetInfo call: &InfoRequest{}
2023-03-31T12:28:22.245783165Z I0331 12:28:22.245690       1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
2023-03-31T12:29:22.269924126Z I0331 12:29:22.269721       1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = timed out while waiting for topology labels to be updated in "c01057a88824-qual-323-0afbb584" CSINodeTopology instance.,}
2023-03-31T12:29:22.269969201Z E0331 12:29:22.269834       1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = timed out while waiting for topology labels to be updated in "c01057a88824-qual-323-0afbb584" CSINodeTopology instance., restarting registration container.

vsphere-csi-driver log

2023-03-31T11:33:28.319787659Z {"level":"info","time":"2023-03-31T11:33:28.319728628Z","caller":"kubernetes/kubernetes.go:395","msg":"Setting client QPS to 100.000000 and Burst to 100.","TraceId":"a7e5dbf2-7c1d-4fb3-9fb3-730e47f69001"}
2023-03-31T11:33:28.345686928Z {"level":"info","time":"2023-03-31T11:33:28.344293416Z","caller":"k8sorchestrator/topology.go:727","msg":"Topology service initiated successfully","TraceId":"a7e5dbf2-7c1d-4fb3-9fb3-730e47f69001"}
2023-03-31T11:33:28.372612710Z {"level":"info","time":"2023-03-31T11:33:28.37245886Z","caller":"k8sorchestrator/topology.go:895","msg":"Successfully created a CSINodeTopology instance for NodeName: \"c01057a88824-qual-323-0afbb584\"","TraceId":"a7e5dbf2-7c1d-4fb3-9fb3-730e47f69001"}
2023-03-31T11:34:28.375379515Z {"level":"error","time":"2023-03-31T11:34:28.374319044Z","caller":"k8sorchestrator/topology.go:837","msg":"timed out while waiting for topology labels to be updated in \"c01057a88824-qual-323-0afbb584\" CSINodeTopology instance.","TraceId":"a7e5dbf2-7c1d-4fb3-9fb3-730e47f69001","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco/k8sorchestrator.(*nodeVolumeTopology).GetNodeTopologyLabels\n\t/build/pkg/csi/service/common/commonco/k8sorchestrator/topology.go:837\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).NodeGetInfo\n\t/build/pkg/csi/service/node.go:429\ngithub.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:6231\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:922"}
2023-03-31T11:34:30.358756419Z {"level":"info","time":"2023-03-31T11:34:30.358584895Z","caller":"service/node.go:338","msg":"NodeGetInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"f21e5f14-20bf-4bb0-a70b-2322ee91fa34"}
2023-03-31T11:34:30.372726452Z {"level":"info","time":"2023-03-31T11:34:30.37258219Z","caller":"k8sorchestrator/topology.go:892","msg":"CSINodeTopology instance already exists for NodeName: \"c01057a88824-qual-323-0afbb584\"","TraceId":"f21e5f14-20bf-4bb0-a70b-2322ee91fa34"}
2023-03-31T11:35:30.375537038Z {"level":"error","time":"2023-03-31T11:35:30.374942664Z","caller":"k8sorchestrator/topology.go:837","msg":"timed out while waiting for topology labels to be updated in \"c01057a88824-qual-323-0afbb584\" CSINodeTopology instance.","TraceId":"f21e5f14-20bf-4bb0-a70b-2322ee91fa34","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco/k8sorchestrator.(*nodeVolumeTopology).GetNodeTopologyLabels\n\t/build/pkg/csi/service/common/commonco/k8sorchestrator/topology.go:837\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).NodeGetInfo\n\t/build/pkg/csi/service/node.go:429\ngithub.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:6231\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:922"}
"kubectl_logs_vsphere-csi-node-mbdjf_--container_vsphere-csi-node_--kubeconfig_.tmp.user-kubeconfig-56089268_--request-timeout_30s_--namespace_kube-system_--timestamps" 63L, 29225B

Some logs from vsphere-csi-controller:

  1. Passed test Passing log: Notice that "Successfully registered VC mtv-qual-vc03.anthos:443" happened first. All the following node/VM can be found
2023-03-30T03:27:10.857292378Z {"level":"info","time":"2023-03-30T03:27:10.857253925Z","caller":"vsphere/virtualcentermanager.go:123","msg":"Successfully registered VC mtv-qual-vc03.anthos:443"}
2023-03-30T03:27:10.857373990Z {"level":"info","time":"2023-03-30T03:27:10.857306174Z","caller":"vsphere/virtualcenter.go:283","msg":"VirtualCenter.connect() creating new client"}
2023-03-30T03:27:10.860343329Z {"level":"info","time":"2023-03-30T03:27:10.860255209Z","caller":"node/manager.go:128","msg":"Discovering the node vm using uuid: \"4211ec99-cb15-a5cd-3193-49bb2883a3fb\"","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.860354679Z {"level":"info","time":"2023-03-30T03:27:10.860281279Z","caller":"vsphere/virtualmachine.go:159","msg":"Initiating asynchronous datacenter listing with uuid 4211ec99-cb15-a5cd-3193-49bb2883a3fb","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.883477322Z {"level":"info","time":"2023-03-30T03:27:10.883355912Z","caller":"k8sorchestrator/k8sorchestrator.go:644","msg":"configMapAdded: Internal feature state values from \"internal-feature-states.csi.vsphere.vmware.com\" stored successfully: map[async-query-volume:true block-volume-snapshot:true csi-migration:true csi-windows-support:true improved-csi-idempotency:true online-volume-extend:true trigger-csi-fullsync:false]","TraceId":"bd419e1b-0881-4e2e-b9d4-b90f744c9a1b"}
2023-03-30T03:27:10.914441786Z {"level":"info","time":"2023-03-30T03:27:10.914259728Z","caller":"vsphere/virtualcenter.go:202","msg":"New session ID for 'VSPHERE.LOCAL\\herc-32210b8fe0bd' = 52611e95-9457-8d31-c11e-9edbca63b82e"}
2023-03-30T03:27:10.914464578Z {"level":"info","time":"2023-03-30T03:27:10.914314422Z","caller":"vsphere/virtualcenter.go:291","msg":"VirtualCenter.connect() successfully created new client"}
2023-03-30T03:27:10.914468743Z {"level":"info","time":"2023-03-30T03:27:10.914338151Z","caller":"vsphere/virtualcenter.go:606","msg":"vCenterInstance initialized"}
2023-03-30T03:27:10.914549752Z {"level":"info","time":"2023-03-30T03:27:10.914476906Z","caller":"volume/manager.go:193","msg":"Initializing new defaultManager..."}
2023-03-30T03:27:10.914677990Z {"level":"info","time":"2023-03-30T03:27:10.914582011Z","caller":"syncer/metadatasyncer.go:417","msg":"Adding watch on path: \"/etc/cloud\""}
2023-03-30T03:27:10.914813271Z {"level":"info","time":"2023-03-30T03:27:10.914732425Z","caller":"volume/manager.go:190","msg":"Retrieving existing defaultManager..."}
2023-03-30T03:27:10.917155580Z {"level":"info","time":"2023-03-30T03:27:10.917045953Z","caller":"kubernetes/kubernetes.go:79","msg":"k8s client using kubeconfig from /etc/kubernetes/kubeconfig.conf"}
2023-03-30T03:27:10.917932242Z {"level":"info","time":"2023-03-30T03:27:10.917828978Z","caller":"kubernetes/kubernetes.go:395","msg":"Setting client QPS to 100.000000 and Burst to 100."}
2023-03-30T03:27:10.924330377Z {"level":"info","time":"2023-03-30T03:27:10.924234917Z","caller":"vsphere/datacenter.go:154","msg":"Publishing datacenter Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.924393747Z {"level":"info","time":"2023-03-30T03:27:10.924310978Z","caller":"vsphere/virtualmachine.go:196","msg":"AsyncGetAllDatacenters with uuid 4211ec99-cb15-a5cd-3193-49bb2883a3fb sent a dc Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.933106473Z {"level":"info","time":"2023-03-30T03:27:10.933012951Z","caller":"vsphere/virtualmachine.go:210","msg":"Found VM VirtualMachine:vm-623336 [VirtualCenterHost: mtv-qual-vc03.anthos, UUID: 4211ec99-cb15-a5cd-3193-49bb2883a3fb, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]] given uuid 4211ec99-cb15-a5cd-3193-49bb2883a3fb on DC Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.933121137Z {"level":"info","time":"2023-03-30T03:27:10.93304699Z","caller":"vsphere/virtualmachine.go:221","msg":"Returning VM VirtualMachine:vm-623336 [VirtualCenterHost: mtv-qual-vc03.anthos, UUID: 4211ec99-cb15-a5cd-3193-49bb2883a3fb, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]] for UUID 4211ec99-cb15-a5cd-3193-49bb2883a3fb","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.933125305Z {"level":"info","time":"2023-03-30T03:27:10.933063049Z","caller":"node/manager.go:151","msg":"Successfully discovered node with nodeUUID 4211ec99-cb15-a5cd-3193-49bb2883a3fb in vm VirtualMachine:vm-623336 [VirtualCenterHost: mtv-qual-vc03.anthos, UUID: 4211ec99-cb15-a5cd-3193-49bb2883a3fb, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]]","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.933128956Z {"level":"info","time":"2023-03-30T03:27:10.933073905Z","caller":"node/manager.go:134","msg":"Successfully discovered node: \"32210b8fe0bd-qual-private306-15003d10\" with nodeUUID \"4211ec99-cb15-a5cd-3193-49bb2883a3fb\"","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.933136796Z {"level":"info","time":"2023-03-30T03:27:10.933081969Z","caller":"node/manager.go:136","msg":"Successfully registered node: \"32210b8fe0bd-qual-private306-15003d10\" with nodeUUID \"4211ec99-cb15-a5cd-3193-49bb2883a3fb\"","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"
  1. Failed test (More logs https://gist.github.com/jingxu97/cc013868270f4d05497a7aba2b59221c) Failed log: before VC is registered, there are "VM not found " error first. After VC is registered, the rest of the nodes can be found.
2023-03-31T11:33:19.880026310Z {"level":"info","time":"2023-03-31T11:33:19.691025909Z","caller":"vsphere/virtualcentermanager.go:74","msg":"Initializing defaultVirtualCenterManager..."}
2023-03-31T11:33:19.880028344Z {"level":"info","time":"2023-03-31T11:33:19.69104239Z","caller":"vsphere/virtualcentermanager.go:76","msg":"Successfully initialized defaultVirtualCenterManager"}
2023-03-31T11:33:19.880031590Z {"level":"error","time":"2023-03-31T11:33:19.691146997Z","caller":"vsphere/virtualmachine.go:227","msg":"Returning VM not found err for UUID 420efa50-4b8b-f108-347a-0a8f21fdb714","TraceId":"973eef74-6eac-4af0-abeb-5cd217b8eaa1","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualMachineByUUID\n\t/build/pkg/common/cns-lib/vsphere/virtualmachine.go:227\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).DiscoverNode\n\t/build/pkg/common/cns-lib/node/manager.go:145\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:129\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880046207Z {"level":"error","time":"2023-03-31T11:33:19.691213382Z","caller":"node/manager.go:147","msg":"Couldn't find VM instance with nodeUUID 420efa50-4b8b-f108-347a-0a8f21fdb714, failed to discover with err: virtual machine wasn't found","TraceId":"973eef74-6eac-4af0-abeb-5cd217b8eaa1","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).DiscoverNode\n\t/build/pkg/common/cns-lib/node/manager.go:147\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:129\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880051146Z {"level":"error","time":"2023-03-31T11:33:19.691256924Z","caller":"node/manager.go:131","msg":"failed to discover VM with uuid: \"420efa50-4b8b-f108-347a-0a8f21fdb714\" for node: \"c01057a88824-qual-323-0afbb5d7\"","TraceId":"973eef74-6eac-4af0-abeb-5cd217b8eaa1","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:131\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880056226Z {"level":"warn","time":"2023-03-31T11:33:19.691282823Z","caller":"node/nodes.go:72","msg":"failed to register node:\"c01057a88824-qual-323-0afbb5d7\". err=virtual machine wasn't found","TraceId":"973eef74-6eac-4af0-abeb-5cd217b8eaa1"}
2023-03-31T11:33:19.880058280Z {"level":"info","time":"2023-03-31T11:33:19.691529217Z","caller":"node/manager.go:128","msg":"Discovering the node vm using uuid: \"420e47c7-ba23-d2a8-6656-526432f8313b\"","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be"}
2023-03-31T11:33:19.880060013Z {"level":"info","time":"2023-03-31T11:33:19.691588529Z","caller":"vsphere/virtualmachine.go:159","msg":"Initiating asynchronous datacenter listing with uuid 420e47c7-ba23-d2a8-6656-526432f8313b","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be"}
2023-03-31T11:33:19.880066836Z {"level":"error","time":"2023-03-31T11:33:19.691638263Z","caller":"vsphere/virtualmachine.go:227","msg":"Returning VM not found err for UUID 420e47c7-ba23-d2a8-6656-526432f8313b","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualMachineByUUID\n\t/build/pkg/common/cns-lib/vsphere/virtualmachine.go:227\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).DiscoverNode\n\t/build/pkg/common/cns-lib/node/manager.go:145\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:129\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880069351Z {"level":"error","time":"2023-03-31T11:33:19.691715308Z","caller":"node/manager.go:147","msg":"Couldn't find VM instance with nodeUUID 420e47c7-ba23-d2a8-6656-526432f8313b, failed to discover with err: virtual machine wasn't found","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).DiscoverNode\n\t/build/pkg/common/cns-lib/node/manager.go:147\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:129\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880080141Z {"level":"error","time":"2023-03-31T11:33:19.691763489Z","caller":"node/manager.go:131","msg":"failed to discover VM with uuid: \"420e47c7-ba23-d2a8-6656-526432f8313b\" for node: \"c01057a88824-qual-323-0afbb584\"","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:131\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880082445Z {"level":"warn","time":"2023-03-31T11:33:19.691811169Z","caller":"node/nodes.go:72","msg":"failed to register node:\"c01057a88824-qual-323-0afbb584\". err=virtual machine wasn't found","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be"}
2023-03-31T11:33:19.880084559Z {"level":"info","time":"2023-03-31T11:33:19.692076979Z","caller":"vsphere/virtualcentermanager.go:123","msg":"Successfully registered VC atl-qual-vc06.anthos:443"}
2023-03-31T11:33:19.880086253Z {"level":"info","time":"2023-03-31T11:33:19.692285863Z","caller":"node/manager.go:128","msg":"Discovering the node vm using uuid: \"420e8daf-1b9e-e49f-e9f9-2c59ab10c59f\"","TraceId":"c42c3ab6-ed72-4328-9ec1-ea34824bdbd4"}
2023-03-31T11:33:19.880088266Z {"level":"info","time":"2023-03-31T11:33:19.692375572Z","caller":"vsphere/virtualmachine.go:159","msg":"Initiating asynchronous datacenter listing with uuid 420e8daf-1b9e-e49f-e9f9-2c59ab10c59f","TraceId":"c42c3ab6-ed72-4328-9ec1-ea34824bdbd4"}
2023-03-31T11:33:19.880090110Z {"level":"info","time":"2023-03-31T11:33:19.692324796Z","caller":"vsphere/virtualcenter.go:283","msg":"VirtualCenter.connect() creating new client"}


jingxu97 avatar Apr 03 '23 16:04 jingxu97

/cc @divyenpatel @xing-yang @msau42 @gnufied @jsafrane

jingxu97 avatar Apr 03 '23 16:04 jingxu97

Similar issue opened before https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/1661

jingxu97 avatar Apr 03 '23 17:04 jingxu97

Before the driver can create CR instances, it is necessary to register the CsiNodeTopology CRD. Once this CRD has been successfully registered, the CSI Node Daemonset's pod will be able to create CsiNodeTopology instances for the node. Node discovery should then take place. If the CRD registration has not taken place yet, you may see the Node Daemonsets pod in a crash loop back off state.

divyenpatel avatar Apr 03 '23 19:04 divyenpatel

which component register "register the CsiNodeTopology CRD"? how to make sure CRD successfully registered before CSI node daemonset pod to create the instance?

jingxu97 avatar Apr 03 '23 19:04 jingxu97

syncer component in the driver registers the CRD. Refer to https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/f762a45e4db2b59e36e80ee943bc42a84f4980cc/pkg/syncer/cnsoperator/manager/init.go#L205-L212

divyenpatel avatar Apr 03 '23 19:04 divyenpatel

how to make sure CRD successfully registered before CSI node daemonset pod to create the instance?

you can install the CSI controller Pod first, let all required CRDs to register and wait for complete initialization and then deploy CSI Node Daemonset pods.

divyenpatel avatar Apr 03 '23 19:04 divyenpatel

Please see more logs https://gist.github.com/jingxu97/cc013868270f4d05497a7aba2b59221c

From what I searched, some logic related to discover VM and register does not have a retry logic, and causing the following VM not found error.

jingxu97 avatar Apr 03 '23 19:04 jingxu97

how to make sure CRD successfully registered before CSI node daemonset pod to create the instance?

you can install the CSI controller Pod first, let all required CRDs to register and wait for complete initialization and then deploy CSI Node Daemonset pods.

Could it possible to add a retry logic instead of having the strict requirement on ordering? It is hard to add the ordering logic when deploying the controller and driver, I think.

jingxu97 avatar Apr 03 '23 19:04 jingxu97

guys I have the same problem, I am using however version 3.0.0 I have the pods the CrashLoopBackOff:

vsphere-csi-controller-68c65dbdd5-cb9jb   0/7     Pending            0             19m
vsphere-csi-controller-68c65dbdd5-whswk   0/7     Pending            0             19m
vsphere-csi-node-9qlc6                    2/3     CrashLoopBackOff   5 (28s ago)   3m40s
vsphere-csi-node-h9hkq                    2/3     CrashLoopBackOff   5 (30s ago)   3m40s
vsphere-csi-node-nbvfp                    2/3     CrashLoopBackOff   5 (45s ago)   3m40s

and going into the logs in one of the pods I get this:

Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe
I0403 22:57:51.418542       1 main.go:167] Version: v2.7.0
I0403 22:57:51.418588       1 main.go:168] Running node-driver-registrar in mode=registration
I0403 22:57:51.419473       1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0403 22:57:51.419515       1 connection.go:154] Connecting to unix:///csi/csi.sock
I0403 22:57:51.420762       1 main.go:199] Calling CSI driver to discover driver name
I0403 22:57:51.420772       1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0403 22:57:51.420776       1 connection.go:184] GRPC request: {}
I0403 22:57:51.424195       1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v3.0.0"}
I0403 22:57:51.424239       1 connection.go:187] GRPC error: <nil>
I0403 22:57:51.424247       1 main.go:209] CSI driver name: "csi.vsphere.vmware.com"
I0403 22:57:51.424312       1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock
I0403 22:57:51.424466       1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock
I0403 22:57:51.424537       1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I0403 22:57:52.522333       1 main.go:102] Received GetInfo call: &InfoRequest{}
I0403 22:57:52.522670       1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
I0403 22:57:52.533985       1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "k8s-worker02". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",}
E0403 22:57:52.534009       1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "k8s-worker02". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.

I'm following step by step this guide: https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-54BB79D2-B13F-4673-8CC2-63A772D17B3C.html

My env consists of: k8s cluster 1.26.3 1 master node 2 worker node esxi 7.0.3 vcenter 7.0.3

gabrieletosca avatar Apr 03 '23 23:04 gabrieletosca

@gabrieletosca I see you have pending vsphere-csi-controller Pods? Can you make CSI controller pod up and running and later check CSI Node Daemonset Pod status?

divyenpatel avatar Apr 04 '23 04:04 divyenpatel

@gabrieletosca I see you have pending vsphere-csi-controller Pods? Can you make CSI controller pod up and running and later check CSI Node Daemonset Pod status?

I can't unfortunatly... this is the log of kubectl describe pods vsphere-csi-controller-68c65dbdd5-cb9jb --namespace=vmware-system-csi:

0/3 nodes are available: 3 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..

and this is the grep for kubectl describe nodes | egrep "Taints:|Name:":

Name:               k8s-master01
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
Name:               k8s-worker01
Taints:             node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
Name:               k8s-worker02
Taints:             node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

I see that in the guide it says to taint only the master (https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-4E47B9F1-B250-4B36-8FEC-8F45E6529D23.html), but at this link (https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-0AB6E692-AA47-4B6A-8CEA-38B754E16567.html) it says to do it on all nodes... I also tried to remove the taint from the workers but it still doesn't work

gabrieletosca avatar Apr 04 '23 09:04 gabrieletosca

@jingxu97 During our debugging session we observed that some of the feature gates were disabled in the v3.0.0 release you were using.

Do you see this issue getting resolved on Anthos setup after enabling all required feature gates for the release v3.0.0?

divyenpatel avatar Apr 07 '23 19:04 divyenpatel

I am running into the same issue with vanilla (kubeadm) kubernetes version 1.26.3 when installing the csi-driver version 3.0 (using this manifest: https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.0.0/manifests/vanilla/vsphere-csi-driver.yaml):

I0424 16:09:37.991893       1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
I0424 16:09:38.030150       1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "omni-kube-controlplane-1". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",}
E0424 16:09:38.030228       1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "omni-kube-controlplane-1". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.

From what I understand reading this thread is that there's no hack to solve this in the meantime, right?

lethargosapatheia avatar Apr 24 '23 16:04 lethargosapatheia

I tried installing only the csi controller deployment first, but I've come across these logs (that might have existed initially too):

vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.945536568Z","caller":"vsphere/virtualcenter.go:171","msg":"failed to create new client with err: Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).NewClient\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:171\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:284\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).Connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:259\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualCenterInstanceForVCenterConfig\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:645\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:234\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.945727725Z","caller":"vsphere/virtualcenter.go:285","msg":"failed to create govmomi client with err: Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:285\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).Connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:259\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualCenterInstanceForVCenterConfig\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:645\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:234\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.946527487Z","caller":"vsphere/virtualcenter.go:287","msg":"failed to connect to vCenter using CA file: \"\"","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:287\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).Connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:259\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualCenterInstanceForVCenterConfig\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:645\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:234\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.947057249Z","caller":"vsphere/virtualcenter.go:261","msg":"Cannot connect to vCenter with err: Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).Connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:261\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualCenterInstanceForVCenterConfig\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:645\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:234\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.947548438Z","caller":"vsphere/virtualcenter.go:647","msg":"failed to connect to VirtualCenter host: \"vmcenter.example.com\". Err: Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualCenterInstanceForVCenterConfig\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:647\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:234\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.947945134Z","caller":"vanilla/controller.go:236","msg":"failed to get vCenterInstance for vCenter \"vmcenter.example.com\"err=Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:236\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.948286142Z","caller":"service/driver.go:189","msg":"failed to init controller. Error: failed to get vCenterInstance for vCenter \"vmcenter.example.com\"err=Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"734c882c-aa75-4f16-be5c-21146bdf9e4b","TraceId":"e0546129-339a-4946-a9cc-195f5b5549b3","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:189\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"info","time":"2023-04-24T16:38:27.948620496Z","caller":"service/driver.go:109","msg":"Configured: \"csi.vsphere.vmware.com\" with clusterFlavor: \"VANILLA\" and mode: \"controller\"","TraceId":"734c882c-aa75-4f16-be5c-21146bdf9e4b","TraceId":"e0546129-339a-4946-a9cc-195f5b5549b3"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.948939301Z","caller":"service/driver.go:203","msg":"failed to run the driver. Err: +failed to get vCenterInstance for vCenter \"vmcenter.example.com\"err=Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"734c882c-aa75-4f16-be5c-21146bdf9e4b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:203\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
Stream closed EOF for vmware-system-csi/vsphere-csi-controller-68c65dbdd5-z6g85 (vsphere-csi-controller)
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.08636366Z","caller":"logger/logger.go:41","msg":"Setting default log level to :\"PRODUCTION\""}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.090380782Z","caller":"syncer/main.go:86","msg":"Version : v3.0.0","TraceId":"920d74a9-f82a-47d6-8a0d-20633809c584"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.090656443Z","caller":"syncer/main.go:103","msg":"Starting container with operation mode: METADATA_SYNC","TraceId":"920d74a9-f82a-47d6-8a0d-20633809c584"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.090779546Z","caller":"kubernetes/kubernetes.go:86","msg":"k8s client using in-cluster config","TraceId":"920d74a9-f82a-47d6-8a0d-20633809c584"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.091272287Z","caller":"kubernetes/kubernetes.go:395","msg":"Setting client QPS to 100.000000 and Burst to 100.","TraceId":"920d74a9-f82a-47d6-8a0d-20633809c584"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.092658207Z","caller":"syncer/main.go:125","msg":"Starting the http server to expose Prometheus metrics..","TraceId":"920d74a9-f82a-47d6-8a0d-20633809c584"}

What stands out is:

dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused

From what I understand, the pod is trying to connect to a DNS on the localhost, which isn't responding, which for me doesn't make any sense, because it's supposed to be connecting to coredns. The pod has its own ip, so it's using a separate network namespace. It's rather hard to follow the logic of it all.

lethargosapatheia avatar Apr 24 '23 16:04 lethargosapatheia

@lethargosapatheia Once you fix connectivity between vSpehre CSI Controller Pod and vCenter server, the issue regarding CSINodeTopology CRD is not found will be fixed.

divyenpatel avatar Apr 24 '23 16:04 divyenpatel

basically, registration of CRD happens in the syncer container, and if it crashes before registering CRDs then Node Daemonset Pods will not be unable to create required CRD instances.

divyenpatel avatar Apr 24 '23 17:04 divyenpatel

@divyenpatel You're right, something isn't actually working properly, I had tinkered with the cluster a little bit before and the coredns service doesn't respond correctly, even if the pods themselves work. I'll have to have a look at that and get back if it all works ok. Thank you for your fast answer!

lethargosapatheia avatar Apr 24 '23 17:04 lethargosapatheia

Ok, I've actually mixed some things up. The thing is, there's no connectivity issue to the DNS. The service actually works fine. Entering the network namespace of a random container (csi-snapshotter) inside the same pod works perfectly:

nsenter -n -t 107665 dig A vmcenter.example.com
; <<>> DiG 9.18.12-0ubuntu0.22.04.1-Ubuntu <<>> A vmcenter.example.com @10.96.0.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18982
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 56dddadc2663ae20 (echoed)
;; QUESTION SECTION:
;vmcenter.example.com.		IN	A

;; ANSWER SECTION:
vmcenter.example.com.	30	IN	A	10.0.0.1

;; Query time: 4 msec
;; SERVER: 10.96.0.10#53(10.96.0.10) (UDP)
;; WHEN: Mon Apr 24 17:42:53 UTC 2023
;; MSG SIZE  rcvd: 91

Is the container explicitly trying to connect to a different DNS server (localhost) than the one it's supposed to (coredns service)? I know it's a stupid question, but I don't get the error at all :)

I also see that the deployment assumes there are three controlplane nodes. I also have two controlplane nodes (and three etcd nodes), so I don't need more. Do you strictly need three controller pods to create a cluster or would it work with two replicas too? Or should I maybe let then run on the worker nodes also?

lethargosapatheia avatar Apr 24 '23 17:04 lethargosapatheia

Having looked again at that pod, I see that the bindmount to /etc/resolv.conf leads to a file on the host whose contents are:

nameserver 127.0.0.1

On a normal deployment, I should have something like:

nameserver 10.96.0.10
options ndots:5

On the exact same node I have a calico-kube-controller pod which also has the right resolver (10.96.0.10).

lethargosapatheia avatar Apr 24 '23 18:04 lethargosapatheia

I've added what I consider to be an issue here: https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/2354 It doesn't look as though the dns behaves correctly inside the pod. But I'm not sure how this is happening.

lethargosapatheia avatar Apr 24 '23 19:04 lethargosapatheia

We will move the registration of CRs in the deployment YAML file so we do not have internal container dependencies while Pod is coming up.

cc: @vdkotkar

divyenpatel avatar May 15 '23 22:05 divyenpatel

/assign @vdkotkar

divyenpatel avatar May 15 '23 22:05 divyenpatel

@divyenpatel: GitHub didn't allow me to assign the following users: vdkotkar.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to this:

/assign @vdkotkar

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar May 15 '23 22:05 k8s-ci-robot

/assign vdkotkar

vdkotkar avatar May 16 '23 04:05 vdkotkar

Hello Guys,

I am seeing the same problem as discussed in this issue. Same is the case with 3.0.0 and 3.0.2.

I also had a look at https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/1661

Wanted to check if there are other workarounds / real-fix to this? Or am I making some mistakes in configuring the vSphere CSI Driver.

I am following: https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-6DBD2645-FFCF-4076-80BE-AD44D7141521.html

More details =>

Pods are the CrashLoopBackOff:

vmware-system-csi   vsphere-csi-controller-5867b9fc45-5kft8                 0/7     Pending          
vmware-system-csi   vsphere-csi-controller-5867b9fc45-hc9c8                 0/7     Pending          
vmware-system-csi   vsphere-csi-controller-5867b9fc45-z7pqf                 0/7     Pending          
vmware-system-csi   vsphere-csi-node-8zlhk                                  2/3     CrashLoopBackOff 
vmware-system-csi   vsphere-csi-node-pr4bf                                  2/3     CrashLoopBackOff 
vmware-system-csi   vsphere-csi-node-t7m7b                                  2/3     CrashLoopBackOff

My setup is:

  • RKE2 Kubernetes 1.25.10
  • RHEL 8.4
  • vSphere 7.0.3 Build 19234570
kubectl get no
NAME      STATUS   ROLES                       AGE   VERSION
rke2vm1   Ready    control-plane,etcd,master   41m   v1.25.10+rke2r1
rke2vm2   Ready    control-plane,etcd,master   37m   v1.25.10+rke2r1
rke2vm3   Ready    control-plane,etcd,master   37m   v1.25.10+rke2r1
kubectl -n vmware-system-csi logs vsphere-csi-node-8zlhk
Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe
I0809 15:57:23.745216       1 main.go:167] Version: v2.7.0
I0809 15:57:23.745254       1 main.go:168] Running node-driver-registrar in mode=registration
I0809 15:57:23.745845       1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0809 15:57:23.745873       1 connection.go:154] Connecting to unix:///csi/csi.sock
I0809 15:57:23.746532       1 main.go:199] Calling CSI driver to discover driver name
I0809 15:57:23.746545       1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0809 15:57:23.746553       1 connection.go:184] GRPC request: {}
I0809 15:57:23.751808       1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v3.0.2"}
I0809 15:57:23.751927       1 connection.go:187] GRPC error: <nil>
I0809 15:57:23.751939       1 main.go:209] CSI driver name: "csi.vsphere.vmware.com"
I0809 15:57:23.752785       1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock
I0809 15:57:23.753406       1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock
I0809 15:57:23.753607       1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I0809 15:57:25.116945       1 main.go:102] Received GetInfo call: &InfoRequest{}
I0809 15:57:25.117192       1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
I0809 15:57:25.134647       1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "rke2vm2". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",}
E0809 15:57:25.134679       1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "rke2vm2". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.

When trying with CSI v3.0.2, I have used as-is this => https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.0.2/manifests/vanilla/vsphere-csi-driver.yaml

Any help please?

vu3oim avatar Aug 09 '23 16:08 vu3oim

Hi Venkat, I see that your controller pods are in pending state, that is eventually causing node pods to go in CrashLoopBackOff state. Please check why your controller pods are in pending state. You can paste the output of kubectl describe pod vsphere-csi-controller-5867b9fc45-5kft8 -n vmware-system-csi. Check if some affinity, anti-affinity rules on pod are causing this issue.

Also, in kubectl get nodes output, I can see only master nodes. Don't you have any worker nodes?

vdkotkar avatar Aug 10 '23 07:08 vdkotkar

Hi Vipul, thanks for the reply. I could get past that problem and get to the finish successfully (vSphere CSI Driver v3.0.2). I have a 3 Node RKE2 cluster, no dedicated Compute nodes. Control and Compute runs on those VMs

But I had a question..

In this https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.0.2/manifests/vanilla/vsphere-csi-driver.yaml YAML, I see =>

      nodeSelector:
        node-role.kubernetes.io/control-plane: ""

Due to that, those PODS were not coming up.

Labels on the nodes are like this =>

# kubectl get nodes --show-labels
NAME      STATUS   ROLES                       AGE   VERSION           LABELS
rke2vm1   Ready    control-plane,etcd,master   67m   v1.25.10+rke2r1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=mantaray
rke2vm2   Ready    control-plane,etcd,master   64m   v1.25.10+rke2r1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm2,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=mantaray
rke2vm3   Ready    control-plane,etcd,master   63m   v1.25.10+rke2r1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm3,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=mantaray

So, I changed to =>

      nodeSelector:
        node-role.kubernetes.io/control-plane: "true"

Then all PODS came up and everything is OK.

I wonder why the earlier selector did not select my nodes. I was thinking the that selector being empty, it must act as wildcard? Any help?

Anyway, thanks for all the help.

vu3oim avatar Aug 11 '23 16:08 vu3oim

@vu3oim only key is needed in the label for node-role-kubernetes-io-control-plane https://kubernetes.io/docs/reference/labels-annotations-taints/#node-role-kubernetes-io-control-plane

How are you deploying k8s cluster?

divyenpatel avatar Aug 14 '23 17:08 divyenpatel

Hi Divyen, I am installing RKE2

RKE2 config file => /etc/rancher/rke2/config.yaml =>

token: mytoken
write-kubeconfig-mode: "0644"
cluster-cidr: "10.128.0.0/14,fd02::/48"
service-cidr: "172.30.0.0/16,fd03::/112"
tls-san:
  - rke2vm1
  - rke2vm2
  - rke2vm3
node-label:
  - "product=test"
disable-cloud-controller: "true"
debug: true

Its a 3 Node K8s cluster (runs both control and compute)

# kubectl get nodes --show-labels
NAME      STATUS   ROLES                       AGE   VERSION           LABELS
rke2vm1   Ready    control-plane,etcd,master   67m   v1.25.10+rke2r1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=test
rke2vm2   Ready    control-plane,etcd,master   64m   v1.25.10+rke2r1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm2,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=test
rke2vm3   Ready    control-plane,etcd,master   63m   v1.25.10+rke2r1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm3,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=test

Then I just follow the vSphere CPI/CSI 3.0 documentation instructions.

Strangely, I need to vSphere CSI Driver yaml (vsphere-csi-driver.yaml) with this to get successful installation of vSphere CSI => node-role.kubernetes.io/control-plane: "true"

vu3oim avatar Aug 16 '23 07:08 vu3oim