vsphere-csi-driver icon indicating copy to clipboard operation
vsphere-csi-driver copied to clipboard

ERROR failed to get shared datastores in topology after upgrade csi from v2.2.1 to v2.5.2

Open nightguide opened this issue 2 years ago • 2 comments

/kind bug

What happened: We have one instance vCenter and our topology looks like that:

Datacenter: region=region-frc Cluster(Cluster_PROD01): zone=zone-prod01 Cluster(Cluster_PROD02): zone=zone-prod02

After upgrade vsphere-csi-driver from v2.2.1 to v2.5.2 we have problems with get shared datastores in topology.

failed to provision volume with StorageClass "vsphere-csi-default": rpc error: code = Internal desc = failed to get shared datastores for topology requirement: requisite:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-frc" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-prod02" > > preferred:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-frc" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-prod02" > > . Error: <nil>

from vsphere-csi-controller:

2022-07-12T10:34:15.571Z	INFO	k8sorchestrator/topology.go:650	Obtained list of nodeVMs []	{"TraceId": "0cf135a8-9ee7-4c2c-b42b-db4644edaf8c"}
2022-07-12T10:34:15.571Z	INFO	k8sorchestrator/topology.go:661	Obtained shared datastores: []	{"TraceId": "0cf135a8-9ee7-4c2c-b42b-db4644edaf8c"}
2022-07-12T10:34:15.571Z	ERROR	vanilla/controller.go:512	failed to get shared datastores for topology requirement: requisite:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-frc" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-prod02" > > preferred:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-frc" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-prod02" > > . Error: <nil>	{"TraceId": "0cf135a8-9ee7-4c2c-b42b-db4644edaf8c"}
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).createBlockVolume
	/build/pkg/csi/service/vanilla/controller.go:512
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume.func1
	/build/pkg/csi/service/vanilla/controller.go:854
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume
	/build/pkg/csi/service/vanilla/controller.go:856
github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler
	/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5589
google.golang.org/grpc.(*Server).processUnaryRPC
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1024
google.golang.org/grpc.(*Server).handleStream
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1313
google.golang.org/grpc.(*Server).serveStreams.func1.1
	/go/pkg/mod/google.golang.org/[email protected]/server.go:722
2022-07-12T10:34:15.571Z	DEBUG	vanilla/controller.go:857	createVolumeInternal: returns fault "csi.fault.Internal"	{"TraceId": "0cf135a8-9ee7-4c2c-b42b-db4644edaf8c"}
2022-07-12T10:34:27.015Z	DEBUG	common/authmanager.go:135	auth manager: refreshDatastoreMapForBlockVolumes is triggered	{"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.015Z	DEBUG	common/authmanager.go:151	auth manager: refreshDatastoreMapsForFileVolumes is triggered	{"TraceId": "9aa6e51b-7bf1-42d0-ae55-f70d36882686"}
2022-07-12T10:34:27.037Z	DEBUG	common/authmanager.go:331	auth manager: file - dsURLs [ds:///vmfs/volumes/62591e18-5c117e2a-146c-7ee41a800037/ ds:///vmfs/volumes/62591f5a-e4fa8b04-901b-d2dd95d0001f/ ds:///vmfs/volumes/625920cf-cbbec3de-26d5-7ee41a800097/] dsInfos [Datastore: Datastore:datastore-8742, datastore URL: ds:///vmfs/volumes/62591e18-5c117e2a-146c-7ee41a800037/ Datastore: Datastore:datastore-8743, datastore URL: ds:///vmfs/volumes/62591f5a-e4fa8b04-901b-d2dd95d0001f/ Datastore: Datastore:datastore-8744, datastore URL: ds:///vmfs/volumes/625920cf-cbbec3de-26d5-7ee41a800097/]{"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z	DEBUG	common/authmanager.go:346	auth manager: HasUserPrivilegeOnEntities returns [{{} Datastore:datastore-8742 [{{} Datastore.FileManagement true} {{} System.Read true}]} {{} Datastore:datastore-8743 [{{} Datastore.FileManagement true} {{} System.Read true}]} {{} Datastore:datastore-8744 [{{} Datastore.FileManagement true} {{} System.Read true}]}], when checking privileges [Datastore.FileManagement System.Read] on entities [Datastore:datastore-8742 Datastore:datastore-8743 Datastore:datastore-8744] for user [email protected]	{"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z	DEBUG	common/authmanager.go:361	auth manager: datastore with URL HDS790-DEV01-Data01 and name ds:///vmfs/volumes/62591e18-5c117e2a-146c-7ee41a800037/ has privileges and is added to dsURLToInfoMap	{"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z	DEBUG	common/authmanager.go:361	auth manager: datastore with URL HDS790-PROD02-Data01 and name ds:///vmfs/volumes/62591f5a-e4fa8b04-901b-d2dd95d0001f/ has privileges and is added to dsURLToInfoMap	{"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z	DEBUG	common/authmanager.go:361	auth manager: datastore with URL HDS790-PROD01-Data01 and name ds:///vmfs/volumes/625920cf-cbbec3de-26d5-7ee41a800097/ has privileges and is added to dsURLToInfoMap	{"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z	DEBUG	common/authmanager.go:141	auth manager: datastoreMapForBlockVolumes is updated to map[ds:///vmfs/volumes/62591e18-5c117e2a-146c-7ee41a800037/:Datastore: Datastore:datastore-8742, datastore URL: ds:///vmfs/volumes/62591e18-5c117e2a-146c-7ee41a800037/ ds:///vmfs/volumes/62591f5a-e4fa8b04-901b-d2dd95d0001f/:Datastore: Datastore:datastore-8743, datastore URL: ds:///vmfs/volumes/62591f5a-e4fa8b04-901b-d2dd95d0001f/ ds:///vmfs/volumes/625920cf-cbbec3de-26d5-7ee41a800097/:Datastore: Datastore:datastore-8744, datastore URL: ds:///vmfs/volumes/625920cf-cbbec3de-26d5-7ee41a800097/]	{"TraceId": "ce5586ae-d4c8-4294-8d3b-efedb178d432"}
2022-07-12T10:34:27.039Z	DEBUG	common/authmanager.go:246	No vSAN datastores found	{"TraceId": "9aa6e51b-7bf1-42d0-ae55-f70d36882686"}
2022-07-12T10:34:27.039Z	DEBUG	common/authmanager.go:158	auth manager: newFsEnabledClusterToDsMap is updated to map[]	{"TraceId": "9aa6e51b-7bf1-42d0-ae55-f70d36882686"}

What you expected to happen: Works ok.

How to reproduce it (as minimally and precisely as possible): Install vsphere csi ver. v2.5.2 Set cloud-config

apiVersion: v1
data:
  vsphere.conf: |
    ---
    global:
      secretName: cpi-global-secret
      secretNamespace: kube-system

    vcenter:
      vcenterim:
        server: `your_vcenter`
        port: 443
        insecureFlag: true
        datacenters:
          - Datacenter

    labels:
      region: k8s-region
      zone: k8s-zone
    ...

Anything else we need to know?:

Environment:

  • csi-vsphere version: v2.5.2
  • vsphere-cloud-controller-manager version: 1.22.3
  • Kubernetes version: 1.22.8
  • vSphere version: 7.0 u2
  • OS (e.g. from /etc/os-release): CentOS Linux release 7.9.2009 (Core)
  • Kernel (e.g. uname -a): 5.4.190-1.el7.elrepo.x86_64
  • Install tools: manifests + customization for zones

Permissions for vCenter user

Cns

  • Searchable

Datastore

  • Allocate space
  • Browse datastore
  • Low level file operations

Host

  • Configuration
    • Storage partition configuration

vSphere Tagging

  • Assign or Unassign vSphere Tag on Object

Resource

  • Assign virtual machine to resource pool

Profile-driven storage

  • Profile-driven storage view

Storage views

  • View

Virtual machine

  • Change Configuration
    • Add existing disk
    • Add new disk
    • Add or remove device
    • Change Settings
    • Remove disk
  • Edit Inventory
    • Create from existing
    • Create new
    • Register
    • Remove

nightguide avatar Jul 12 '22 10:07 nightguide

We experienced the same problem. We fixed this by restarting the csi-node and csi-controller.

fplantinga-guida avatar Jul 21 '22 07:07 fplantinga-guida

Hi, i got the same behaviour,

failed to get shared datastores ,

because when i look at the log : the CreateVolume has AccessibilityRequirements parameters set to NIL ?

why the topology constraints are not used ?? because my topology labels are ok and shown perfectly throught csinodes et nodes labels.

So where the CreateVolume get the AccessibilityRequirements parameters from ?

{"level":"info","time":"2022-09-30T15:30:23.258657522Z","caller":"vanilla/controller.go:827","msg":"CreateVolume: called with args {Name:pvc-d6fb853c-51d8-4bcc-8a68-7184dbf1dc86 CapacityRange:required_bytes:1073741824  VolumeCapabilities:[mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"cca68d17-9ea3-4f15-a722-b806a9f2062a"}
{"level":"error","time":"2022-09-30T15:30:23.284090253Z","caller":"node/nodes.go:361","msg":"failed to get shared datastores for node VMs. Err: no shared datastores found for nodeVm: VirtualMachine:vm-67948 

Thanks for helps

seb-835 avatar Sep 30 '22 15:09 seb-835

@nightguide Could you provide the following information to further debug the issue:

  1. Output of kubectl get nodes --show-labels
  2. Output of kubectl describe csinodetopology
  3. Output of kubectl describe csinodes Thank you!

shalini-b avatar Oct 28 '22 04:10 shalini-b

Hi, i got the same behaviour,

failed to get shared datastores ,

because when i look at the log : the CreateVolume has AccessibilityRequirements parameters set to NIL ?

why the topology constraints are not used ?? because my topology labels are ok and shown perfectly throught csinodes et nodes labels.

So where the CreateVolume get the AccessibilityRequirements parameters from ?

{"level":"info","time":"2022-09-30T15:30:23.258657522Z","caller":"vanilla/controller.go:827","msg":"CreateVolume: called with args {Name:pvc-d6fb853c-51d8-4bcc-8a68-7184dbf1dc86 CapacityRange:required_bytes:1073741824  VolumeCapabilities:[mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"cca68d17-9ea3-4f15-a722-b806a9f2062a"}
{"level":"error","time":"2022-09-30T15:30:23.284090253Z","caller":"node/nodes.go:361","msg":"failed to get shared datastores for node VMs. Err: no shared datastores found for nodeVm: VirtualMachine:vm-67948 

Thanks for helps

Can you confirm if you have followed the installation guide for setting up a topology aware vSphere cluster as mentioned in https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-162E7582-723B-4A0F-A937-3ACE82EAFD31.html page?

shalini-b avatar Oct 28 '22 04:10 shalini-b

Hi, i got the same behaviour, failed to get shared datastores , because when i look at the log : the CreateVolume has AccessibilityRequirements parameters set to NIL ? why the topology constraints are not used ?? because my topology labels are ok and shown perfectly throught csinodes et nodes labels. So where the CreateVolume get the AccessibilityRequirements parameters from ?

{"level":"info","time":"2022-09-30T15:30:23.258657522Z","caller":"vanilla/controller.go:827","msg":"CreateVolume: called with args {Name:pvc-d6fb853c-51d8-4bcc-8a68-7184dbf1dc86 CapacityRange:required_bytes:1073741824  VolumeCapabilities:[mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"cca68d17-9ea3-4f15-a722-b806a9f2062a"}
{"level":"error","time":"2022-09-30T15:30:23.284090253Z","caller":"node/nodes.go:361","msg":"failed to get shared datastores for node VMs. Err: no shared datastores found for nodeVm: VirtualMachine:vm-67948 

Thanks for helps

Can you confirm if you have followed the installation guide for setting up a topology aware vSphere cluster as mentioned in https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-162E7582-723B-4A0F-A937-3ACE82EAFD31.html page?

I am sorry, i forget to mention that it was fix, i was using RKE2 vsphere deployment method that was not managing topology correctly. I patch their manifest , and now all goes OK Thanks for the time spend on this .

seb-835 avatar Oct 28 '22 14:10 seb-835

/assign @shalini-b

gohilankit avatar Nov 03 '22 21:11 gohilankit

@nightguide @fplantinga-guida are you experiencing the same issue @seb-835 faced? Is this working for you? Can we close this issue? Please update.

divyenpatel avatar Dec 01 '22 20:12 divyenpatel

We haven't experienced this issue again. You can close the issue

fplantinga-guida avatar Dec 02 '22 07:12 fplantinga-guida

I am sorry, i forget to mention that it was fix, i was using RKE2 vsphere deployment method that was not managing topology correctly. I patch their manifest , and now all goes OK Thanks for the time spend on this .

@seb-835 could you please enlighten us and tell us what patches you made? TIA

smartbit avatar Jan 12 '23 15:01 smartbit

I am sorry, i forget to mention that it was fix, i was using RKE2 vsphere deployment method that was not managing topology correctly. I patch their manifest , and now all goes OK Thanks for the time spend on this .

@seb-835 could you please enlighten us and tell us what patches you made? TIA

You may have a look to : https://github.com/rancher/rke2/issues/3398 , https://github.com/rancher/rke2/issues/3468. I hope this can help you to find your issue.

seb-835 avatar Jan 12 '23 16:01 seb-835