cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

Install CSI driver by default in preparation of CSI Migration

Open Jiawei0227 opened this issue 4 years ago • 22 comments

1. Describe IN DETAIL the feature/behavior/change you would like to see.

CSI Migration is a Kubernetes feature that when turn on, it will redirect in-tree plugin traffic to the corresponding CSI driver. It has been Beta in k8s since v1.17 without turning on by default.

Recently, we decide to push this feature forward and it will be turn on by default in v1.22 for a lot of plugins according to our plan.

It would be good if cluster-api can prepare for this upcoming change. Specifically, cluster-api should deploy the corresponding CSI drivers by default for the corresponding cloud. The driver is a requisite for CSI migration to work.

  • GCP - GCE PD CSI Driver
  • AWS - AWS EBS CSI Driver
  • Azure - Azuredisk/Azurefile CSI Driver

2. Background

/kind feature /cc @msau42

Jiawei0227 avatar Feb 09 '21 17:02 Jiawei0227

@CecileRobertMichon @nader-ziada @randomvariable @yastij

setting milestone v0.4.0/important soon for now, but probably this should be discussed at the CAPI meeting /milestone v0.4.0 /priority important-soon /area

fabriziopandini avatar Feb 10 '21 10:02 fabriziopandini

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar May 11 '21 10:05 fejta-bot

What's the status of this issue?

vincepri avatar May 11 '21 17:05 vincepri

Might be we should raise priority on this topic:

@CecileRobertMichon @nader-ziada @randomvariable @yastij @gab-satchi opinions?

fabriziopandini avatar Jun 01 '21 16:06 fabriziopandini

/remove-lifecycle stale

vincepri avatar Jun 02 '21 17:06 vincepri

AWS issue is here https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1475 with corresponding PR for e2e.

randomvariable avatar Jun 02 '21 17:06 randomvariable

@fabriziopandini I wonder if we have the same issue with ccm (not sure when the internal cloud provider is removed though)

sbueringer avatar Jun 10 '21 04:06 sbueringer

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 08 '21 05:09 k8s-triage-robot

Are there any action items to follow up here?

vincepri avatar Sep 08 '21 15:09 vincepri

CAPI infra providers need to log issues and adapt to the change. otherwise it can be a breaking change to users of the same providers.

neolit123 avatar Sep 08 '21 15:09 neolit123

@yastij @randomvariable @CecileRobertMichon

vincepri avatar Sep 08 '21 16:09 vincepri

/lifecycle frozen

vincepri avatar Sep 08 '21 17:09 vincepri

  • Changes are required starting from Kubernetes 1.23(-ish)
  • How do we install (and lifecycle manage?) these types of addons? Should we look at the addons project, what about ClusterResourceSet?
  • How do we upgrade current users?

One short term solution is to include the CRS manifests as part of provider infrastructure template.

vincepri avatar Sep 08 '21 17:09 vincepri

/assign @CecileRobertMichon @yastij to collaborate on goals/non-goals/requirements

vincepri avatar Sep 08 '21 17:09 vincepri

note: take a look at issue on runtime extensions

CecileRobertMichon avatar Sep 08 '21 18:09 CecileRobertMichon

kops tracking issue: https://github.com/kubernetes/kops/issues/10777

might be worth getting some feedback from them on a zoom call.

neolit123 avatar Sep 08 '21 18:09 neolit123

@CecileRobertMichon @yastij trying to review this thread, with a sligtly expanded scope including CPI and whatever add on with a lifecycle we consider linked to the cluster lifecycle (CPI, CSI, CNI?). Is it possible to define what are the requirement for the lifecycle of those addons:

  • when should they be installed, upgraded, deleted
  • how the configuration gets passed to those addons? are there configuration changes outside upgrade downgrade?
  • more?

fabriziopandini avatar Oct 12 '21 16:10 fabriziopandini

@andyzhangx @feiskyer to help answer these questions for Azure CSI drivers and Azure CCM

CecileRobertMichon avatar Oct 12 '21 20:10 CecileRobertMichon

They are not required yet as in-tree drivers are still supported, but we suggest enabling them when upgrading the cluster to k8s v1.22+ as both CSI and CCM have been GA for a while and new feature are only landing in out-of-tree CSI and CCM implementations.

Refer https://github.com/kubernetes-sigs/azuredisk-csi-driver#project-status-ga and https://github.com/kubernetes-sigs/cloud-provider-azure#current-status for CSI/CCM version matrix.

feiskyer avatar Oct 13 '21 01:10 feiskyer

and make sure the CSI drivers could be disabled by switch in cluster creation, we want to install latest versions(or even master branch) of CSI driver in some testing scenario.

andyzhangx avatar Oct 27 '21 01:10 andyzhangx

/milestone v1.2

vincepri avatar Jan 31 '22 17:01 vincepri

/triage accepted

@CecileRobertMichon @yastij PTAL and update

fabriziopandini avatar Oct 03 '22 17:10 fabriziopandini

This is an important one to implement. Perhaps we can discuss where this is at on next CAPI call?

dtzar avatar Jan 04 '23 00:01 dtzar

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Deprioritize it with /priority important-longterm or /priority backlog
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot avatar Apr 04 '23 00:04 k8s-triage-robot

/triage accepted

@CecileRobertMichon @yastij PTAL and update

fabriziopandini avatar Apr 14 '23 16:04 fabriziopandini

I don't think there is anything in scope for CAPI itself here, CAPI can't "install" CSI drivers by default since those are provider specific. It's similar to CNI and CPI/CCM/CNM (Cloud Controller Manager for out of tree cloud provider). These should be installed just like other critical cluster addons (maybe that's where https://github.com/kubernetes-sigs/cluster-api-addon-provider-helm can help) and up to each provider to have docs/reference templates (eg. https://capz.sigs.k8s.io/topics/addons.html#storage-drivers).

Not sure there's much more we can do here @fabriziopandini, not without adding a way for providers to hook into CAPI to provider provider-specific instructions (is that something runtimeExtensions could help with?)

CecileRobertMichon avatar Apr 14 '23 18:04 CecileRobertMichon

Agree it would be good to standardize the direction for the driver installation process - i.e. with CAAPH. If CAAPH doesn't meet the requirements for some reason, it would be good to know what's missing.

dtzar avatar Apr 17 '23 23:04 dtzar

/close

Based on the fact that there is not much to do in core CAPI. The CAPI office hours, where we are already giving room for updates to CAAPH and providers, could be the avenue to carry on this discussion; I will add a point to the next office hours agenda

fabriziopandini avatar May 01 '23 13:05 fabriziopandini

@fabriziopandini: Closing this issue.

In response to this:

/close

Based on the fact that there is not much to do in core CAPI. The CAPI office hours, where we are already giving room for updates to CAAPH and providers, could be the avenue to carry on this discussion; I will add a point to the next office hours agenda

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar May 01 '23 13:05 k8s-ci-robot