vsphere-csi-driver icon indicating copy to clipboard operation
vsphere-csi-driver copied to clipboard

Fix volume provisioning when duplicate region or zone tags exist

Open dobsonj opened this issue 11 months ago • 1 comments

What this PR does / why we need it:

This PR allows volume provisioning to work in two scenarios:

A) Both the datacenter and the cluster have a (duplicate) region tag

datacenter1: region-A
  cluster1: region-A, zone-C
    host1: 

B) Both the cluster and the host have a (duplicate) zone tag

datacenter1: region-A
  cluster1: zone-C
    host1: zone-C

In both cases, the volume fails to be provisioned with:

Warning ProvisioningFailed 7m42s (x9 over 12m) csi.vsphere.vmware.com_vmware-vsphere-csi-driver-controller-f9bf8757c-7mhdl_4c338b41-cb91-400f-8034-eb1cf1452e34 failed to provision volume with StorageClass "thin-csi": rpc error: code = Internal desc = failed to create volume. Errors encountered: [No compatible datastores found for accessibility requirements [map[topology.csi.vmware.com/openshift-region:region-A topology.csi.vmware.com/openshift-zone:zone-C]] pertaining to vCenter ...

Although this cluster is tagged incorrectly, this scenario did work on previous versions of the driver, which means upgrading the driver could break volume provisioning on existing clusters where this previously worked.

I propose a simple solution: when analyzing the tags, prefer the entity highest in the hierarchy and ignore the duplicate tag lower in the hierarchy. If the datacenter has a region tag, that takes precedent over any cluster underneath it. Likewise, if the cluster has a zone tag, that takes precedent over any host in that cluster.

This can be done with two changes:

  1. Remove duplicate hosts in GetHostsForSegment

findCommonHostsforAllTopologyKeys only returns the host reference if the host is found exactly once in each segment. If the tag exists twice in the hierarchy, it fails and returns an empty slice (no common hosts) even though there is a common host in both region-A and zone-C. See this backtrace as an example. If we remove duplicate host entries before finding common hosts, it correctly returns the only common host for the region and zone tags.

  1. Ignore duplicate tags in GetTopologyLabels

After changing GetHostsForSegment, it is still possible for provisioning to fail during GetTopologyLabels because the CSINodeTopology reports Error: duplicate values detected for category openshift-zone as \\\"zone-C\\\" and \\\"zone-C\\\" in its status. Since GetTopologyLabels iterates up the hierarchy through ancestors starting from the NodeVM, it could update the value instead of throwing an error, which means the higher-level entity's value takes precedent.

Which issue this PR fixes:

fixes #2681

Testing done:

Manually tested volume provisioning with and without this change in the two scenarios listed above.

Special notes for your reviewer:

/cc @divyenpatel @shalini-b @gnufied

Release note:

Fix volume provisioning when duplicate region or zone tags exist

dobsonj avatar Mar 07 '24 00:03 dobsonj

/ok-to-test

divyenpatel avatar Mar 25 '24 23:03 divyenpatel

Hi @shalini-b @divyenpatel, do you feel comfortable accepting this change? Or do you have any additional feedback?

dobsonj avatar Apr 04 '24 20:04 dobsonj

/approve

shalini-b avatar Apr 05 '24 21:04 shalini-b

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dobsonj, shalini-b

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Apr 05 '24 21:04 k8s-ci-robot

/lgtm

jsafrane avatar Apr 08 '24 15:04 jsafrane