vsphere-csi-driver
vsphere-csi-driver copied to clipboard
ERROR failed to get shared datastores in topology after upgrade csi from v2.2.1 to v2.5.2
/kind bug
What happened: We have one instance vCenter and our topology looks like that:
Datacenter: region=region-frc Cluster(Cluster_PROD01): zone=zone-prod01 Cluster(Cluster_PROD02): zone=zone-prod02
After upgrade vsphere-csi-driver
from v2.2.1
to v2.5.2
we have problems with get shared datastores in topology.
failed to provision volume with StorageClass "vsphere-csi-default": rpc error: code = Internal desc = failed to get shared datastores for topology requirement: requisite:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-frc" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-prod02" > > preferred:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-frc" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-prod02" > > . Error: <nil>
from vsphere-csi-controller:
2022-07-12T10:34:15.571Z INFO k8sorchestrator/topology.go:650 Obtained list of nodeVMs [] {"TraceId": "0cf135a8-9ee7-4c2c-b42b-db4644edaf8c"}
2022-07-12T10:34:15.571Z INFO k8sorchestrator/topology.go:661 Obtained shared datastores: [] {"TraceId": "0cf135a8-9ee7-4c2c-b42b-db4644edaf8c"}
2022-07-12T10:34:15.571Z ERROR vanilla/controller.go:512 failed to get shared datastores for topology requirement: requisite:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-frc" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-prod02" > > preferred:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-frc" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-prod02" > > . Error: <nil> {"TraceId": "0cf135a8-9ee7-4c2c-b42b-db4644edaf8c"}
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).createBlockVolume
/build/pkg/csi/service/vanilla/controller.go:512
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume.func1
/build/pkg/csi/service/vanilla/controller.go:854
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume
/build/pkg/csi/service/vanilla/controller.go:856
github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler
/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5589
google.golang.org/grpc.(*Server).processUnaryRPC
/go/pkg/mod/google.golang.org/[email protected]/server.go:1024
google.golang.org/grpc.(*Server).handleStream
/go/pkg/mod/google.golang.org/[email protected]/server.go:1313
google.golang.org/grpc.(*Server).serveStreams.func1.1
/go/pkg/mod/google.golang.org/[email protected]/server.go:722
2022-07-12T10:34:15.571Z DEBUG vanilla/controller.go:857 createVolumeInternal: returns fault "csi.fault.Internal" {"TraceId": "0cf135a8-9ee7-4c2c-b42b-db4644edaf8c"}
2022-07-12T10:34:27.015Z DEBUG common/authmanager.go:135 auth manager: refreshDatastoreMapForBlockVolumes is triggered {"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.015Z DEBUG common/authmanager.go:151 auth manager: refreshDatastoreMapsForFileVolumes is triggered {"TraceId": "9aa6e51b-7bf1-42d0-ae55-f70d36882686"}
2022-07-12T10:34:27.037Z DEBUG common/authmanager.go:331 auth manager: file - dsURLs [ds:///vmfs/volumes/62591e18-5c117e2a-146c-7ee41a800037/ ds:///vmfs/volumes/62591f5a-e4fa8b04-901b-d2dd95d0001f/ ds:///vmfs/volumes/625920cf-cbbec3de-26d5-7ee41a800097/] dsInfos [Datastore: Datastore:datastore-8742, datastore URL: ds:///vmfs/volumes/62591e18-5c117e2a-146c-7ee41a800037/ Datastore: Datastore:datastore-8743, datastore URL: ds:///vmfs/volumes/62591f5a-e4fa8b04-901b-d2dd95d0001f/ Datastore: Datastore:datastore-8744, datastore URL: ds:///vmfs/volumes/625920cf-cbbec3de-26d5-7ee41a800097/]{"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z DEBUG common/authmanager.go:346 auth manager: HasUserPrivilegeOnEntities returns [{{} Datastore:datastore-8742 [{{} Datastore.FileManagement true} {{} System.Read true}]} {{} Datastore:datastore-8743 [{{} Datastore.FileManagement true} {{} System.Read true}]} {{} Datastore:datastore-8744 [{{} Datastore.FileManagement true} {{} System.Read true}]}], when checking privileges [Datastore.FileManagement System.Read] on entities [Datastore:datastore-8742 Datastore:datastore-8743 Datastore:datastore-8744] for user [email protected] {"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z DEBUG common/authmanager.go:361 auth manager: datastore with URL HDS790-DEV01-Data01 and name ds:///vmfs/volumes/62591e18-5c117e2a-146c-7ee41a800037/ has privileges and is added to dsURLToInfoMap {"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z DEBUG common/authmanager.go:361 auth manager: datastore with URL HDS790-PROD02-Data01 and name ds:///vmfs/volumes/62591f5a-e4fa8b04-901b-d2dd95d0001f/ has privileges and is added to dsURLToInfoMap {"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z DEBUG common/authmanager.go:361 auth manager: datastore with URL HDS790-PROD01-Data01 and name ds:///vmfs/volumes/625920cf-cbbec3de-26d5-7ee41a800097/ has privileges and is added to dsURLToInfoMap {"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z DEBUG common/authmanager.go:141 auth manager: datastoreMapForBlockVolumes is updated to map[ds:///vmfs/volumes/62591e18-5c117e2a-146c-7ee41a800037/:Datastore: Datastore:datastore-8742, datastore URL: ds:///vmfs/volumes/62591e18-5c117e2a-146c-7ee41a800037/ ds:///vmfs/volumes/62591f5a-e4fa8b04-901b-d2dd95d0001f/:Datastore: Datastore:datastore-8743, datastore URL: ds:///vmfs/volumes/62591f5a-e4fa8b04-901b-d2dd95d0001f/ ds:///vmfs/volumes/625920cf-cbbec3de-26d5-7ee41a800097/:Datastore: Datastore:datastore-8744, datastore URL: ds:///vmfs/volumes/625920cf-cbbec3de-26d5-7ee41a800097/] {"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z DEBUG common/authmanager.go:246 No vSAN datastores found {"TraceId": "9aa6e51b-7bf1-42d0-ae55-f70d36882686"}
2022-07-12T10:34:27.039Z DEBUG common/authmanager.go:158 auth manager: newFsEnabledClusterToDsMap is updated to map[] {"TraceId": "9aa6e51b-7bf1-42d0-ae55-f70d36882686"}
What you expected to happen: Works ok.
How to reproduce it (as minimally and precisely as possible):
Install vsphere csi ver. v2.5.2
Set cloud-config
apiVersion: v1
data:
vsphere.conf: |
---
global:
secretName: cpi-global-secret
secretNamespace: kube-system
vcenter:
vcenterim:
server: `your_vcenter`
port: 443
insecureFlag: true
datacenters:
- Datacenter
labels:
region: k8s-region
zone: k8s-zone
...
Anything else we need to know?:
Environment:
- csi-vsphere version:
v2.5.2
- vsphere-cloud-controller-manager version:
1.22.3
- Kubernetes version:
1.22.8
- vSphere version:
7.0 u2
- OS (e.g. from /etc/os-release): CentOS Linux release 7.9.2009 (Core)
- Kernel (e.g.
uname -a
): 5.4.190-1.el7.elrepo.x86_64 - Install tools: manifests + customization for zones
Permissions for vCenter user
Cns
- Searchable
Datastore
- Allocate space
- Browse datastore
- Low level file operations
Host
- Configuration
- Storage partition configuration
vSphere Tagging
- Assign or Unassign vSphere Tag on Object
Resource
- Assign virtual machine to resource pool
Profile-driven storage
- Profile-driven storage view
Storage views
- View
Virtual machine
- Change Configuration
- Add existing disk
- Add new disk
- Add or remove device
- Change Settings
- Remove disk
- Edit Inventory
- Create from existing
- Create new
- Register
- Remove
We experienced the same problem. We fixed this by restarting the csi-node and csi-controller.
Hi, i got the same behaviour,
failed to get shared datastores ,
because when i look at the log : the CreateVolume has AccessibilityRequirements parameters set to NIL ?
why the topology constraints are not used ?? because my topology labels are ok and shown perfectly throught csinodes et nodes labels.
So where the CreateVolume get the AccessibilityRequirements parameters from ?
{"level":"info","time":"2022-09-30T15:30:23.258657522Z","caller":"vanilla/controller.go:827","msg":"CreateVolume: called with args {Name:pvc-d6fb853c-51d8-4bcc-8a68-7184dbf1dc86 CapacityRange:required_bytes:1073741824 VolumeCapabilities:[mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"cca68d17-9ea3-4f15-a722-b806a9f2062a"}
{"level":"error","time":"2022-09-30T15:30:23.284090253Z","caller":"node/nodes.go:361","msg":"failed to get shared datastores for node VMs. Err: no shared datastores found for nodeVm: VirtualMachine:vm-67948
Thanks for helps
@nightguide Could you provide the following information to further debug the issue:
- Output of
kubectl get nodes --show-labels
- Output of
kubectl describe csinodetopology
- Output of
kubectl describe csinodes
Thank you!
Hi, i got the same behaviour,
failed to get shared datastores ,
because when i look at the log : the CreateVolume has AccessibilityRequirements parameters set to NIL ?
why the topology constraints are not used ?? because my topology labels are ok and shown perfectly throught csinodes et nodes labels.
So where the CreateVolume get the AccessibilityRequirements parameters from ?
{"level":"info","time":"2022-09-30T15:30:23.258657522Z","caller":"vanilla/controller.go:827","msg":"CreateVolume: called with args {Name:pvc-d6fb853c-51d8-4bcc-8a68-7184dbf1dc86 CapacityRange:required_bytes:1073741824 VolumeCapabilities:[mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"cca68d17-9ea3-4f15-a722-b806a9f2062a"} {"level":"error","time":"2022-09-30T15:30:23.284090253Z","caller":"node/nodes.go:361","msg":"failed to get shared datastores for node VMs. Err: no shared datastores found for nodeVm: VirtualMachine:vm-67948
Thanks for helps
Can you confirm if you have followed the installation guide for setting up a topology aware vSphere cluster as mentioned in https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-162E7582-723B-4A0F-A937-3ACE82EAFD31.html page?
Hi, i got the same behaviour, failed to get shared datastores , because when i look at the log : the CreateVolume has AccessibilityRequirements parameters set to NIL ? why the topology constraints are not used ?? because my topology labels are ok and shown perfectly throught csinodes et nodes labels. So where the CreateVolume get the AccessibilityRequirements parameters from ?
{"level":"info","time":"2022-09-30T15:30:23.258657522Z","caller":"vanilla/controller.go:827","msg":"CreateVolume: called with args {Name:pvc-d6fb853c-51d8-4bcc-8a68-7184dbf1dc86 CapacityRange:required_bytes:1073741824 VolumeCapabilities:[mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"cca68d17-9ea3-4f15-a722-b806a9f2062a"} {"level":"error","time":"2022-09-30T15:30:23.284090253Z","caller":"node/nodes.go:361","msg":"failed to get shared datastores for node VMs. Err: no shared datastores found for nodeVm: VirtualMachine:vm-67948
Thanks for helps
Can you confirm if you have followed the installation guide for setting up a topology aware vSphere cluster as mentioned in https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-162E7582-723B-4A0F-A937-3ACE82EAFD31.html page?
I am sorry, i forget to mention that it was fix, i was using RKE2 vsphere deployment method that was not managing topology correctly. I patch their manifest , and now all goes OK Thanks for the time spend on this .
/assign @shalini-b
@nightguide @fplantinga-guida are you experiencing the same issue @seb-835 faced? Is this working for you? Can we close this issue? Please update.
We haven't experienced this issue again. You can close the issue
I am sorry, i forget to mention that it was fix, i was using RKE2 vsphere deployment method that was not managing topology correctly. I patch their manifest , and now all goes OK Thanks for the time spend on this .
@seb-835 could you please enlighten us and tell us what patches you made? TIA
I am sorry, i forget to mention that it was fix, i was using RKE2 vsphere deployment method that was not managing topology correctly. I patch their manifest , and now all goes OK Thanks for the time spend on this .
@seb-835 could you please enlighten us and tell us what patches you made? TIA
You may have a look to : https://github.com/rancher/rke2/issues/3398 , https://github.com/rancher/rke2/issues/3468. I hope this can help you to find your issue.