cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

Implement support for kubeadm v1beta4 API

Open fabriziopandini opened this issue 1 year ago • 6 comments

What would you like to be added (User Story)?

As a user, I want to be able to create clusters with kubeadm 1.31 (which most probably is going to use v1beta4 API) As a user, I want to be able to use latest and greatest introduced by kubeadm v1beta4 API

Detailed Description

Changes introduced by kubeadm v1beta4 that we might add to CABPK without breaking changes (caveats, those changes apply only to clusters with K8s >= 1.31, for older cluster they are no-op)

  • ClusterConfiguration.Proxy.Disabled (note, this might has a correlation with the controlplane.cluster.x-k8s.io/skip-coredns annotation)
  • ClusterConfiguration.DNS.Disabled (note, this might has a correlation with the controlplane.cluster.x-k8s.io/skip-kube-proxy annotation)
  • ClusterConfiguration.EncryptionAlgorithm (note, exposing this flag might imply other changes in Cluster API certificate management)
  • ClusterConfiguration.CertificateValidityPeriod (note, exposing this flag might imply other changes in Cluster API certificate management)
  • ClusterConfiguration.CACertificateValidityPeriod (note, exposing this flag might imply other changes in Cluster API certificate management)
  • ClusterConfiguration.*.ExtraEnvs
  • Init/JoinConfiguration.NodeRegistrationOptions.ImagePullSerial
  • Init/JoinConfiguration.Timeouts. Note:
    • ClusterConfiguration.TimeoutForControlPlane is now Init/JoinConfiguration.Timeout.ControlPlaneComponentHealthCheck
    • JoinConfiguration.Discovery.Timeout is now JoinConfiguration.Timeout.TLSBootstrap

Changes introduced by kubeadm v1beta4 that require CABPK breaking changes to be implemented

  • ClusterConfiguration.*.ExtraArgs allowing to set multiple values for the same key
  • Init/JoinConfiguration.NodeRegistrationOptions.KubeletExtraArgs allowing to set multiple values for the same key

Changes introduced by kubeadm v1beta4 that are not relevant to CABPK

  • Init/JoinConfiguration.DryRun (dry run makes sense only when using kubeadm from the CLI in interactive mode)
  • ResetConfiguration, UpgradeConfiguration (we are not using this commands in CABPK)

Anything else you would like to add?

rif https://github.com/kubernetes/kubernetes/pull/125029

Action Plan

Mandatory tasks to support Kubernetes v1.31:

  • [x] Implements conversions from CAPI v1beta1 types to kubeadm v1beta4 https://github.com/kubernetes-sigs/cluster-api/pull/10709
    • Special handling should be implemented for ClusterConfiguration.TimeoutForControlPlane and JoinConfiguration.Discovery.Timeout

Optional non breaking changes to be implemented ASAP:

  • [x] Before adding new fields, check potential impacts on things like https://github.com/kubernetes-sigs/cluster-api/blob/57dc2317bea6dea7cbc82535f8180afa518b7fcd/controlplane/kubeadm/internal/filters.go#L220, also ClusterClass and topology reconcile https://github.com/kubernetes-sigs/cluster-api/pull/10846
  • [x] Add ClusterConfiguration.*.ExtraEnvs https://github.com/kubernetes-sigs/cluster-api/pull/10846
  • [x] Add Init/JoinConfiguration.NodeRegistrationOptions.ImagePullSerial https://github.com/kubernetes-sigs/cluster-api/pull/10846
  • [ ] Add Init/JoinConfiguration.Timeout
    • Important: Timeout.ControlPlaneComponentHealthCheck and Timeout.TLSBootstrap must not be added now to ensure a clean migration of ClusterConfiguration.TimeoutForControlPlane and JoinConfiguration.Discovery.Timeout when we introduce CAPI v1beta2 types

Changes deferred to when we review certificate management / renewal

  • [ ] Add ClusterConfiguration.CertificateValidityPeriod and ClusterConfiguration.CACertificateValidityPeriod

Changes deferred to when we review kubeadm/KCP addon management

  • [ ] Add ClusterConfiguration.Proxy.Disabled and ClusterConfiguration.DNS.Disabled

Changes deferred to when we implement https://github.com/kubernetes-sigs/cluster-api/issues/10077

  • [ ] Add ClusterConfiguration.EncryptionAlgorithm

Changes deferred to when we implement CAPI v1beta2 types

  • [ ] Refactor ClusterConfiguration.*.ExtraArgs and Init/JoinConfiguration.NodeRegistrationOptions.KubeletExtraArgs
  • [ ] Add Timeout.ControlPlaneComponentHealthCheck and Timeout.TLSBootstrap and remove ClusterConfiguration.TimeoutForControlPlane and JoinConfiguration.Discovery.Timeout

Label(s) to be applied

/kind feature

fabriziopandini avatar May 30 '24 19:05 fabriziopandini

/priority important-soon note: priority assumes we can continue to work with v1beta3 API, but if this is not true it must be bumped to critical-urgent

fabriziopandini avatar May 30 '24 19:05 fabriziopandini

/reopen

sbueringer avatar Jun 17 '24 18:06 sbueringer

@sbueringer: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jun 17 '24 18:06 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 06 '24 20:10 k8s-triage-robot

/lifecycle frozen

sbueringer avatar Oct 07 '24 07:10 sbueringer

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 11 '25 13:02 k8s-triage-robot

Add Init/JoinConfiguration.Timeout

Is it possible to work on the above task now?

Our cluster has a large etcd db, which causes the learner to fail to promote to follower, so we have disabled kubeadm EtcdLearnerMode. However, this option was removed in v1.33 with GA. Therefore, we would like to use timeouts.etcdAPICall.

  • https://github.com/kubernetes/kubernetes/blob/v1.32.5/cmd/kubeadm/app/util/etcd/etcd.go#L574

superbrothers avatar May 21 '25 11:05 superbrothers

@fabriziopandini is just now taking a closer look at this issue and at least implement some part of it. (I let him respond if what you are asking for is covered there or not)

sbueringer avatar May 21 '25 15:05 sbueringer

https://github.com/kubernetes-sigs/cluster-api/pull/12282 implements changes for extra args, timeouts and image pull polices

Other changes with broader impacts on other CAPI features are tracked in separated issues:

Certificate management / renewal --> https://github.com/kubernetes-sigs/cluster-api/issues/12289

Proxy and DNS installation and management --> https://github.com/kubernetes-sigs/cluster-api/issues/12288

Support more EncryptionAlgorithm for certificates -> https://github.com/kubernetes-sigs/cluster-api/issues/10077

fabriziopandini avatar May 24 '25 11:05 fabriziopandini