cluster-api-provider-vsphere
cluster-api-provider-vsphere copied to clipboard
Proposal for Native Support of Datastore Clusters in CAPV for Optimal Storage Placement
/kind feature
Describe the solution you'd like
Background
Currently, when using a Storage Policy
and Datastore Cluster
to clone VM, CAPV will fetch the compatible datastores and randomly selects a compatible one , see code logic here. However, there's a bug which leds to unexpected behavior, as CAPV considers a Datastore Cluster also as a compatible datastore. This issue has been reported in #1914 and #1853. It can be quickly addressed by fixing the compatibility check in PR #1937. However, a more significant concern is that CAPV is not fully leveraging the capabilities of Datastore Clusters , which are designed to manage storage by leveraging features such as Storage DRS that can provide storage placement recommendations that consider various constraints, including space usage, affinity rules, and datastore maintenance mode. The current random selection approach in CAPV does not ensure optimal datastore placement and does not utilize the capabilities of Datastore Clusters effectively.
Proposal
This feature request aims to enhance CAPV by adding native support for Datastore Clusters using the underlying govmomi object StorageResourceManager
. This enhancement will enable CAPV to leverage the placement recommendations provided by underlying Storage DRS based on specified constraints and objectives. By these recommendations, CAPV can select one of the recommended datastores to clone VM. This enhancement will ensure that CAPV fully utilizes the capabilities of Datastore Clusters, enabling users to take full advantage of storage placement while maintaining compatibility with storage policies.
Benefits Improved Datastore Placement: By leveraging Datastore Clusters and Storage DRS placement recommendations, CAPV can ensure optimal placement of VM disk based on various constraints, resulting in better storage utilization and performance.
Implementation Details
-
Get Datastore Cluster based on Storage Policy
-
Leveraging StorageResourceManager object to retrieve placement recommendations from a Datastore Cluster.
datastoreCluster, err := ctx.Session.Finder.DatastoreCluster(ctx, "DatastoreClusterZhg")
if err != nil {
return errors.Wrapf(err, "unable to get datastore %s for %q", ctx.VSphereVM.Spec.Datastore, ctx)
}
storagePodRef := types.NewReference(datastoreCluster.Reference())
// Build pod selection spec from config spec
podSelectionSpec := types.StorageDrsPodSelectionSpec{
StoragePod: storagePodRef,
}
folderRef := folder.Reference()
vmRef := tpl.Reference()
// Build the placement spec
storagePlacementSpec := types.StoragePlacementSpec{
Folder: &folderRef,
Vm: &vmRef,
CloneName: ctx.VSphereVM.Name,
CloneSpec: &spec,
PodSelectionSpec: podSelectionSpec,
Type: string(types.StoragePlacementSpecPlacementTypeClone),
}
// Get the storage placement result
storageResourceManager := object.NewStorageResourceManager(ctx.Session.Client.Client)
result, err := storageResourceManager.RecommendDatastores(ctx, storagePlacementSpec)
if err != nil {
return fmt.Errorf("couldn't get recommended datastores: %s by storage resource manager, err: %v", storagePlacementSpec, err)
}
// Get the recommendations
recommendations := result.Recommendations
if len(recommendations) == 0 {
return fmt.Errorf("no datastore-cluster recommendations")
}
// Get the first recommendation
datastoreRef = &recommendations[0].Action[0].(*types.StoragePlacementAction).Destination
}
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Discussion for of DatastoreCluster
and StorageResourceManager
support in govmomi :
Environment:
- Cluster-api-provider-vsphere version:
- Kubernetes version: (use
kubectl version
): - OS (e.g. from
/etc/os-release
):
@zhanggbj Q: is this already done?
Took a quick try with StorageResourceManager
in CAPV, there's a problem with fullclone
mode (VirtualMachineRelocateDiskMoveOptionsMoveAllDiskBackingsAndConsolidate), it needs more investigation about the failure. Please find more details in https://github.com/vmware/govmomi/issues/3138
Clone mode: When I try to get
recommendDatastore
withVirtualMachineCloneSpec.Location.DiskMoveType
asVirtualMachineRelocateDiskMoveOptionsMoveAllDiskBackingsAndAllowSharing
andVirtualMachineRelocateDiskMoveOptionsCreateNewChildDiskBacking
, they are working well. However, I encountered an error while attempting move option asVirtualMachineRelocateDiskMoveOptionsMoveAllDiskBackingsAndConsolidate
. The specific error message is as follows: "err: ServerFaultCode: A specified parameter was not correct: diskMoveType'" This behavior deviates from my expectations, in our scenario, we're using this type as a VSphere full clone mode.
CC @sbueringer
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.