gcp-compute-persistent-disk-csi-driver
gcp-compute-persistent-disk-csi-driver copied to clipboard
hyperdisk-balanced topology issues
hyperdisks-balanced disks are not usable on most (?) VMs. Similarly, regular persistent disks are not usable on N4/C4 VMs. It makes scheduling of Pods that use hyperdisk-balanced PVs challenging on clusters with mixed VMs, say N2 and N4.
Are there any guidelines how to configure the CSI driver and StorageClasses so PVCs that are scheduled to N4 VMs use hyperdisk-balanced disks and PVCs that are scheduled to N2 VMs use standard PDs?
Right now I can imagine putting all N4 machines into a single availability zone + make sure that there is no N2 VM there. I can then create two dedicated StorageClasses:
- hyperdisk: with
allowedTopologiestargeting the availability zone with N4 machines +type: hyperdisk-balanced. - disk: with
allowedTopologiestargeting all other AZs withtype: pd-standard.
Scheduler is then able to choose the right nodes that use PVs provisioned from these StorageClasses. But it's quite cumbersome to set up.
It feels like there should be two separate CSI drivers, with separate topologies and attach limits.
We don't have a great solution for this. We're working on some ideas. The attach limit is a problem for sure. Using separate CSI drivers would fix it, but it starts getting silly in terms of node resource consumption, especially given that we need to reserve space for mount-time operations like fsck and mkfs that can consume a lot of memory for large volumes.
The problem is worse. Each hyperdisk type has different supported machine types and volume limits. So you would essentially need one CSI driver per disk type.
One idea we did discuss in the past was to have the ability for a CSI driver to be registered with multiple names. It would require all the sidecars to be able to handle processing requests from multiple csi drivers. It would also require the user to explicitly use a different driver name in the storage class, which could also complicate things if we wanted to support transparently changing disk types.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Some ideas for the general problem are being discussed in this doc. When those get closer to reality we'll add issues & PRs in this repo.
/close
@mattcary: Closing this issue.
In response to this:
Some ideas for the general problem are being discussed in this doc. When those get closer to reality we'll add issues & PRs in this repo.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.