cluster-api Install CSI driver by default in preparation of CSI Migration

1. Describe IN DETAIL the feature/behavior/change you would like to see.

CSI Migration is a Kubernetes feature that when turn on, it will redirect in-tree plugin traffic to the corresponding CSI driver. It has been Beta in k8s since v1.17 without turning on by default.

Recently, we decide to push this feature forward and it will be turn on by default in v1.22 for a lot of plugins according to our plan.

It would be good if cluster-api can prepare for this upcoming change. Specifically, cluster-api should deploy the corresponding CSI drivers by default for the corresponding cloud. The driver is a requisite for CSI migration to work.

GCP - GCE PD CSI Driver
AWS - AWS EBS CSI Driver
Azure - Azuredisk/Azurefile CSI Driver

2. Background

/kind feature /cc @msau42

Feb 09 '21 17:02 Jiawei0227

@CecileRobertMichon @nader-ziada @randomvariable @yastij

setting milestone v0.4.0/important soon for now, but probably this should be discussed at the CAPI meeting /milestone v0.4.0 /priority important-soon /area

Feb 10 '21 10:02 fabriziopandini

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

May 11 '21 10:05 fejta-bot

What's the status of this issue?

May 11 '21 17:05 vincepri

Might be we should raise priority on this topic:

@CecileRobertMichon @nader-ziada @randomvariable @yastij @gab-satchi opinions?

Jun 01 '21 16:06 fabriziopandini

/remove-lifecycle stale

Jun 02 '21 17:06 vincepri

AWS issue is here https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1475 with corresponding PR for e2e.

Jun 02 '21 17:06 randomvariable

@fabriziopandini I wonder if we have the same issue with ccm (not sure when the internal cloud provider is removed though)

Jun 10 '21 04:06 sbueringer

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 08 '21 05:09 k8s-triage-robot

Are there any action items to follow up here?

Sep 08 '21 15:09 vincepri

CAPI infra providers need to log issues and adapt to the change. otherwise it can be a breaking change to users of the same providers.

Sep 08 '21 15:09 neolit123

@yastij @randomvariable @CecileRobertMichon

Sep 08 '21 16:09 vincepri

/lifecycle frozen

Sep 08 '21 17:09 vincepri

Changes are required starting from Kubernetes 1.23(-ish)
How do we install (and lifecycle manage?) these types of addons? Should we look at the addons project, what about ClusterResourceSet?
How do we upgrade current users?

One short term solution is to include the CRS manifests as part of provider infrastructure template.

Sep 08 '21 17:09 vincepri

/assign @CecileRobertMichon @yastij to collaborate on goals/non-goals/requirements

Sep 08 '21 17:09 vincepri

note: take a look at issue on runtime extensions

Sep 08 '21 18:09 CecileRobertMichon

kops tracking issue: https://github.com/kubernetes/kops/issues/10777

might be worth getting some feedback from them on a zoom call.

Sep 08 '21 18:09 neolit123

@CecileRobertMichon @yastij trying to review this thread, with a sligtly expanded scope including CPI and whatever add on with a lifecycle we consider linked to the cluster lifecycle (CPI, CSI, CNI?). Is it possible to define what are the requirement for the lifecycle of those addons:

when should they be installed, upgraded, deleted
how the configuration gets passed to those addons? are there configuration changes outside upgrade downgrade?
more?

Oct 12 '21 16:10 fabriziopandini

@andyzhangx @feiskyer to help answer these questions for Azure CSI drivers and Azure CCM

Oct 12 '21 20:10 CecileRobertMichon

They are not required yet as in-tree drivers are still supported, but we suggest enabling them when upgrading the cluster to k8s v1.22+ as both CSI and CCM have been GA for a while and new feature are only landing in out-of-tree CSI and CCM implementations.

Refer https://github.com/kubernetes-sigs/azuredisk-csi-driver#project-status-ga and https://github.com/kubernetes-sigs/cloud-provider-azure#current-status for CSI/CCM version matrix.

Oct 13 '21 01:10 feiskyer

and make sure the CSI drivers could be disabled by switch in cluster creation, we want to install latest versions(or even master branch) of CSI driver in some testing scenario.

Oct 27 '21 01:10 andyzhangx

/milestone v1.2

Jan 31 '22 17:01 vincepri

/triage accepted

@CecileRobertMichon @yastij PTAL and update

Oct 03 '22 17:10 fabriziopandini

This is an important one to implement. Perhaps we can discuss where this is at on next CAPI call?

Jan 04 '23 00:01 dtzar

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Deprioritize it with /priority important-longterm or /priority backlog
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

Apr 04 '23 00:04 k8s-triage-robot

/triage accepted

@CecileRobertMichon @yastij PTAL and update

Apr 14 '23 16:04 fabriziopandini

I don't think there is anything in scope for CAPI itself here, CAPI can't "install" CSI drivers by default since those are provider specific. It's similar to CNI and CPI/CCM/CNM (Cloud Controller Manager for out of tree cloud provider). These should be installed just like other critical cluster addons (maybe that's where https://github.com/kubernetes-sigs/cluster-api-addon-provider-helm can help) and up to each provider to have docs/reference templates (eg. https://capz.sigs.k8s.io/topics/addons.html#storage-drivers).

Not sure there's much more we can do here @fabriziopandini, not without adding a way for providers to hook into CAPI to provider provider-specific instructions (is that something runtimeExtensions could help with?)

Apr 14 '23 18:04 CecileRobertMichon

Agree it would be good to standardize the direction for the driver installation process - i.e. with CAAPH. If CAAPH doesn't meet the requirements for some reason, it would be good to know what's missing.

Apr 17 '23 23:04 dtzar

/close

Based on the fact that there is not much to do in core CAPI. The CAPI office hours, where we are already giving room for updates to CAAPH and providers, could be the avenue to carry on this discussion; I will add a point to the next office hours agenda

May 01 '23 13:05 fabriziopandini

@fabriziopandini: Closing this issue.

In response to this:

/close

Based on the fact that there is not much to do in core CAPI. The CAPI office hours, where we are already giving room for updates to CAAPH and providers, could be the avenue to carry on this discussion; I will add a point to the next office hours agenda

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

May 01 '23 13:05 k8s-ci-robot

cluster-api cluster-api copied to clipboard

Install CSI driver by default in preparation of CSI Migration

cluster-api
cluster-api copied to clipboard