cluster-api-provider-vsphere icon indicating copy to clipboard operation
cluster-api-provider-vsphere copied to clipboard

VM disk placement isn't distributed across datastores when a datastore cluster is used

Open gab-satchi opened this issue 3 years ago • 18 comments

/kind bug

What steps did you take and what happened: env setup:

  • datastore cluster with more than one datastore
  • storage policy that targets the datastore cluster
  • create multiple machines using the storage policy

What did you expect to happen:

  • expected the machines' disks to be distributed across all the datastores.

What actually happened:

  • a single datastore is repeatedly targeted for the disk.

Environment:

  • Cluster-api-provider-vsphere version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

gab-satchi avatar Aug 09 '21 13:08 gab-satchi

/label triage/needs-information

gab-satchi avatar Aug 16 '21 18:08 gab-satchi

@gab-satchi: The label(s) /label triage/needs-information cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda

In response to this:

/label triage/needs-information

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 16 '21 18:08 k8s-ci-robot

/label needs-information

gab-satchi avatar Aug 16 '21 19:08 gab-satchi

@gab-satchi: The label(s) /label needs-information cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda

In response to this:

/label needs-information

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 16 '21 19:08 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 14 '21 19:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 14 '21 20:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Jan 13 '22 20:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 13 '22 20:01 k8s-ci-robot

/reopen /remove-lifecycle rotten /lifecycle frozen

srm09 avatar Jan 28 '22 23:01 srm09

@srm09: Reopened this issue.

In response to this:

/reopen /remove-lifecycle rotten /lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 28 '22 23:01 k8s-ci-robot

/milestone Next

srm09 avatar Jan 28 '22 23:01 srm09

/help

srm09 avatar Jan 31 '22 00:01 srm09

@srm09: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 31 '22 00:01 k8s-ci-robot

/remove-lifecycle frozen /lifecycle active

srm09 avatar Feb 16 '23 21:02 srm09

/assign

srm09 avatar Feb 23 '23 02:02 srm09

/unassign

srm09 avatar Feb 23 '23 22:02 srm09

/assign I would like to work on it.

zhanggbj avatar May 11 '23 08:05 zhanggbj

Some investigation about this issue:

  • Briefly CAPV will check if user are using specific datastore or storage policy(our case), if using storage policy, randomly pick one and the create the VM.
  • Based on my observation, if CAPV choose sharedVmfs-1, VSphere firstly creates a disk folder on sharedVmfs-1 but eventually moving all the disk files and the folder back to sharedVmfs-0. This is the same result as the issue reported that all are located on sharedVmfs-0, but in fact, CAPV send the right request and there is an intermediate state on sharedVmfs-1, but finally all are moved to sharedVmfs-0.

So this is not a simple bug, it contains multiple works as below:

  1. There's a known bug that when using DatastoreCluster, CAPV will take DatastoreCluster itself also as a compatible datastore, this will lead to unexpected behavior. This can be fixed by PR #1937
  2. Instead of choosing datastore randomly by CAPV, we should delegate this to StorageResourceManager to leverage DatastoreCluster natively. Here's a proposal in #1938
  3. About the distribution, this may need more investigation, which may related to Storage DRS and some anti-affinity rules.

zhanggbj avatar Jul 11 '23 07:07 zhanggbj