vsphere-csi-driver
vsphere-csi-driver copied to clipboard
Fix volume provisioning when duplicate region or zone tags exist
What this PR does / why we need it:
This PR allows volume provisioning to work in two scenarios:
A) Both the datacenter and the cluster have a (duplicate) region tag
datacenter1: region-A
cluster1: region-A, zone-C
host1:
B) Both the cluster and the host have a (duplicate) zone tag
datacenter1: region-A
cluster1: zone-C
host1: zone-C
In both cases, the volume fails to be provisioned with:
Warning ProvisioningFailed 7m42s (x9 over 12m) csi.vsphere.vmware.com_vmware-vsphere-csi-driver-controller-f9bf8757c-7mhdl_4c338b41-cb91-400f-8034-eb1cf1452e34 failed to provision volume with StorageClass "thin-csi": rpc error: code = Internal desc = failed to create volume. Errors encountered: [No compatible datastores found for accessibility requirements [map[topology.csi.vmware.com/openshift-region:region-A topology.csi.vmware.com/openshift-zone:zone-C]] pertaining to vCenter ...
Although this cluster is tagged incorrectly, this scenario did work on previous versions of the driver, which means upgrading the driver could break volume provisioning on existing clusters where this previously worked.
I propose a simple solution: when analyzing the tags, prefer the entity highest in the hierarchy and ignore the duplicate tag lower in the hierarchy. If the datacenter has a region tag, that takes precedent over any cluster underneath it. Likewise, if the cluster has a zone tag, that takes precedent over any host in that cluster.
This can be done with two changes:
- Remove duplicate hosts in GetHostsForSegment
findCommonHostsforAllTopologyKeys only returns the host reference if the host is found exactly once in each segment. If the tag exists twice in the hierarchy, it fails and returns an empty slice (no common hosts) even though there is a common host in both region-A and zone-C. See this backtrace as an example. If we remove duplicate host entries before finding common hosts, it correctly returns the only common host for the region and zone tags.
- Ignore duplicate tags in GetTopologyLabels
After changing GetHostsForSegment, it is still possible for provisioning to fail during GetTopologyLabels because the CSINodeTopology reports Error: duplicate values detected for category openshift-zone as \\\"zone-C\\\" and \\\"zone-C\\\"
in its status. Since GetTopologyLabels iterates up the hierarchy through ancestors starting from the NodeVM, it could update the value instead of throwing an error, which means the higher-level entity's value takes precedent.
Which issue this PR fixes:
fixes #2681
Testing done:
Manually tested volume provisioning with and without this change in the two scenarios listed above.
Special notes for your reviewer:
/cc @divyenpatel @shalini-b @gnufied
Release note:
Fix volume provisioning when duplicate region or zone tags exist
/ok-to-test
Hi @shalini-b @divyenpatel, do you feel comfortable accepting this change? Or do you have any additional feedback?
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: dobsonj, shalini-b
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [shalini-b]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
/lgtm