csi-driver
csi-driver copied to clipboard
Support Zone-Aware Topologies in the KubeVirt CSI Driver
What this PR does / why we need it: This pull request introduces support for zone-aware topology in the KubeVirt CSI driver. This enhancement allows the CSI driver to handle storage provisioning and attachment in multi-zone Kubernetes clusters. Integrates with Kubernetes topology-aware volume provisioners to ensure volumes are created in the same zone/region as the requesting node by:
- Implements support for TopologyKeys to enable zone/region-specific volume scheduling and attachment
- Modifies the Controller plugin to handle topology constraints during volume provisioning
- Updates to the Node plugin to include zone information in NodeGetInfo responses
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #
Special notes for your reviewer:
Release note:
Support zone and region-aware topologies in the KubeVirt CSI Driver
Hi @moadqassem. Thanks for your PR.
PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/test all
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign mhenriks for approval. For more information see the Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@awels I have fixed some tests files that were failing and update the PR to resolve few conflicts. However, there were some other issues which I believe are not relevant to my PR:
I0114 23:43:26.714262 37496 create-pvc_test.go:608] Event: AttachVolume.Attach failed for volume "tenant-pv" : rpc error: code = Unknown desc = invalid volume name [FailedAttachVolume]
And
failed to provision volume with StorageClass "kubevirt": rpc error: code = InvalidArgument desc = non-block volume with RWX access mode is not supported
I also took a quick look into the testing process and In order to add an e2e k8s test to make sure that the zone and region are respected, I must change few things in the kubevirtci cluster provider(or add a new provider, haven't looked deep so not sure). This is where it needs to be adjusted to add labels that points the allowed topology: https://github.com/kubevirt/kubevirtci/blob/a291de27bb596074c79729ea6f88533555f523fd/cluster-up/cluster/ephemeral-provider-common.sh#L90
Let me know what do you think?
/test all
Sorry been very busy with other stuff, I will try to take a look at this soon.
If you look at the testing, we do actually exclude a few tests from the k8s test suite. In particular the RWX filesystem one, since we don't support that.
If you look at the testing, we do actually exclude a few tests from the k8s test suite. In particular the RWX filesystem one, since we don't support that.
I see. Alright let me check this filtering criteria.
https://github.com/kubevirt/csi-driver/blob/main/hack/run-k8s-e2e.sh#L117-L128 is what we use, I believe the last skip is what skips the RWX filesystem test.
Okay so I think I know what happened here, we made a change in how we identify which VM the volume is hotplugged into. Before we used the VMI, but that proved to be problematic, now we look at the VM directly. I am not sure if that affects how you look up VMs, it likely does.
Okay so I think I know what happened here, we made a change in how we identify which VM the volume is hotplugged into. Before we used the VMI, but that proved to be problematic, now we look at the VM directly. I am not sure if that affects how you look up VMs, it likely does.
hmm I don't think that I am listing or trying to get any VM as the code that I wrote is running on the DS which is in general deals with Node reference rather than the VM itself. The problem is, those tests are failing and I can't really check out what was the issue as you probably know the error traces(such as log messages) might come from other controllers which are not tracked. Also I noticed other PRs that have the exact failures. Do you know how we should proceed here ?
/test all
@awels After taking a deeper look, seems that these lines of code here broke the VMs lookups to fetch VMs info from the CSINode object https://github.com/kubevirt/csi-driver/pull/130/files#diff-6cb93ab3a359a5bd44c54dbb81ac13c78f2051b868087906408564274b5e94bbR337-R349. These new additions, changed how we fetched those VMs where the volume attachments should be happening in regards how the CSINode node id is being processed.
@awels can you please rerun the ci tests again. I have tested my recent changes and I was able to fix the problem,
/test all
/test pull-csi-driver-split-k8s-suite-k8s
PR looks good, one thing I forgot to ask about, did you enabled the topology tests in the k8s test suite. You can modify https://github.com/kubevirt/csi-driver/blob/main/hack/test-driver.yaml and set topology: true and that should be all that is needed to run the topology tests.
/test all
/test all
@moadqassem: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| pull-csi-driver-e2e-k8s | 4b2711d22d9b43ce2feab12eb5bec82e8ed6c434 | link | true | /test pull-csi-driver-e2e-k8s |
| pull-csi-driver-split-e2e-k8s | 4b2711d22d9b43ce2feab12eb5bec82e8ed6c434 | link | true | /test pull-csi-driver-split-e2e-k8s |
| pull-csi-driver-split-k8s-suite-k8s | 4b2711d22d9b43ce2feab12eb5bec82e8ed6c434 | link | true | /test pull-csi-driver-split-k8s-suite-k8s |
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
Looks like we are having an issue with the tests when we enable topology here. I still have not had a change to look at why.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale