cluster-api
cluster-api copied to clipboard
Install CSI driver by default in preparation of CSI Migration
1. Describe IN DETAIL the feature/behavior/change you would like to see.
CSI Migration is a Kubernetes feature that when turn on, it will redirect in-tree plugin traffic to the corresponding CSI driver. It has been Beta in k8s since v1.17 without turning on by default.
Recently, we decide to push this feature forward and it will be turn on by default in v1.22 for a lot of plugins according to our plan.
It would be good if cluster-api can prepare for this upcoming change. Specifically, cluster-api should deploy the corresponding CSI drivers by default for the corresponding cloud. The driver is a requisite for CSI migration to work.
- GCP - GCE PD CSI Driver
- AWS - AWS EBS CSI Driver
- Azure - Azuredisk/Azurefile CSI Driver
2. Background
/kind feature /cc @msau42
@CecileRobertMichon @nader-ziada @randomvariable @yastij
setting milestone v0.4.0/important soon for now, but probably this should be discussed at the CAPI meeting /milestone v0.4.0 /priority important-soon /area
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
What's the status of this issue?
Might be we should raise priority on this topic:
@CecileRobertMichon @nader-ziada @randomvariable @yastij @gab-satchi opinions?
/remove-lifecycle stale
AWS issue is here https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1475 with corresponding PR for e2e.
@fabriziopandini I wonder if we have the same issue with ccm (not sure when the internal cloud provider is removed though)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Are there any action items to follow up here?
CAPI infra providers need to log issues and adapt to the change. otherwise it can be a breaking change to users of the same providers.
@yastij @randomvariable @CecileRobertMichon
/lifecycle frozen
- Changes are required starting from Kubernetes 1.23(-ish)
- How do we install (and lifecycle manage?) these types of addons? Should we look at the addons project, what about ClusterResourceSet?
- How do we upgrade current users?
One short term solution is to include the CRS manifests as part of provider infrastructure template.
/assign @CecileRobertMichon @yastij to collaborate on goals/non-goals/requirements
note: take a look at issue on runtime extensions
kops tracking issue: https://github.com/kubernetes/kops/issues/10777
might be worth getting some feedback from them on a zoom call.
@CecileRobertMichon @yastij trying to review this thread, with a sligtly expanded scope including CPI and whatever add on with a lifecycle we consider linked to the cluster lifecycle (CPI, CSI, CNI?). Is it possible to define what are the requirement for the lifecycle of those addons:
- when should they be installed, upgraded, deleted
- how the configuration gets passed to those addons? are there configuration changes outside upgrade downgrade?
- more?
@andyzhangx @feiskyer to help answer these questions for Azure CSI drivers and Azure CCM
They are not required yet as in-tree drivers are still supported, but we suggest enabling them when upgrading the cluster to k8s v1.22+ as both CSI and CCM have been GA for a while and new feature are only landing in out-of-tree CSI and CCM implementations.
Refer https://github.com/kubernetes-sigs/azuredisk-csi-driver#project-status-ga and https://github.com/kubernetes-sigs/cloud-provider-azure#current-status for CSI/CCM version matrix.
and make sure the CSI drivers could be disabled by switch in cluster creation, we want to install latest versions(or even master branch) of CSI driver in some testing scenario.
/milestone v1.2
/triage accepted
@CecileRobertMichon @yastij PTAL and update
This is an important one to implement. Perhaps we can discuss where this is at on next CAPI call?
This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.
You can:
- Confirm that this issue is still relevant with
/triage accepted(org members only) - Deprioritize it with
/priority important-longtermor/priority backlog - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/triage accepted
@CecileRobertMichon @yastij PTAL and update
I don't think there is anything in scope for CAPI itself here, CAPI can't "install" CSI drivers by default since those are provider specific. It's similar to CNI and CPI/CCM/CNM (Cloud Controller Manager for out of tree cloud provider). These should be installed just like other critical cluster addons (maybe that's where https://github.com/kubernetes-sigs/cluster-api-addon-provider-helm can help) and up to each provider to have docs/reference templates (eg. https://capz.sigs.k8s.io/topics/addons.html#storage-drivers).
Not sure there's much more we can do here @fabriziopandini, not without adding a way for providers to hook into CAPI to provider provider-specific instructions (is that something runtimeExtensions could help with?)
Agree it would be good to standardize the direction for the driver installation process - i.e. with CAAPH. If CAAPH doesn't meet the requirements for some reason, it would be good to know what's missing.
/close
Based on the fact that there is not much to do in core CAPI. The CAPI office hours, where we are already giving room for updates to CAAPH and providers, could be the avenue to carry on this discussion; I will add a point to the next office hours agenda
@fabriziopandini: Closing this issue.
In response to this:
/close
Based on the fact that there is not much to do in core CAPI. The CAPI office hours, where we are already giving room for updates to CAAPH and providers, could be the avenue to carry on this discussion; I will add a point to the next office hours agenda
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.