cluster-api-provider-vsphere
cluster-api-provider-vsphere copied to clipboard
VM disk placement isn't distributed across datastores when a datastore cluster is used
/kind bug
What steps did you take and what happened: env setup:
- datastore cluster with more than one datastore
- storage policy that targets the datastore cluster
- create multiple machines using the storage policy
What did you expect to happen:
- expected the machines' disks to be distributed across all the datastores.
What actually happened:
- a single datastore is repeatedly targeted for the disk.
Environment:
- Cluster-api-provider-vsphere version:
- Kubernetes version: (use
kubectl version
): - OS (e.g. from
/etc/os-release
):
/label triage/needs-information
@gab-satchi: The label(s) /label triage/needs-information
cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda
In response to this:
/label triage/needs-information
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/label needs-information
@gab-satchi: The label(s) /label needs-information
cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda
In response to this:
/label needs-information
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen /remove-lifecycle rotten /lifecycle frozen
@srm09: Reopened this issue.
In response to this:
/reopen /remove-lifecycle rotten /lifecycle frozen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/milestone Next
/help
@srm09: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
In response to this:
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-lifecycle frozen /lifecycle active
/assign
/unassign
/assign I would like to work on it.
Some investigation about this issue:
- Briefly CAPV will check if user are using specific datastore or storage policy(our case), if using storage policy, randomly pick one and the create the VM.
- Based on my observation, if CAPV choose sharedVmfs-1, VSphere firstly creates a disk folder on sharedVmfs-1 but eventually moving all the disk files and the folder back to sharedVmfs-0. This is the same result as the issue reported that all are located on sharedVmfs-0, but in fact, CAPV send the right request and there is an intermediate state on sharedVmfs-1, but finally all are moved to sharedVmfs-0.
So this is not a simple bug, it contains multiple works as below:
- There's a known bug that when using DatastoreCluster, CAPV will take DatastoreCluster itself also as a compatible datastore, this will lead to unexpected behavior. This can be fixed by PR #1937
- Instead of choosing datastore randomly by CAPV, we should delegate this to
StorageResourceManager
to leverage DatastoreCluster natively. Here's a proposal in #1938 - About the distribution, this may need more investigation, which may related to Storage DRS and some anti-affinity rules.