kops icon indicating copy to clipboard operation
kops copied to clipboard

Upgrade to Karpenter 0.32 and v1beta1

Open rifelpet opened this issue 2 years ago • 12 comments

/kind feature

1. Describe IN DETAIL the feature/behavior/change you would like to see. Karpenter v0.32 was released with a v1beta1 that has significant changes from the v1alpha APIs.

There is a migration guide that covers the new CRDs.

2. Feel free to provide a design supporting your feature request.

It looks like all CRD API fields used in kops' template have a 1:1 translation to new fields. At the very least we'll need to enable pruning to cleanup the old custom resources. I'm not sure if pruning will work for both custom resources and their CRD.

The aws.enableENILimitedPodDensity that we currently set has been removed:

The aws.enablePodENI was dropped since Karpenter will now always assume that vpc.amazonaws.com/pod-eni resource exists. The aws.enableENILimitedPodDensity was dropped since you can now override the --max-pods value for kubelet in the spec.kubelet.maxPods for NodeClaims or NodeClaimTemplates

Its not clear what that means exactly and whether kops becomes responsible for tracking the max pods for each instance type.

rifelpet avatar Dec 01 '23 03:12 rifelpet

Pruning the old CRDs will be a challenge because normally we skip CRD pruning:

https://github.com/kubernetes/kops/blob/master/upup/pkg/fi/cloudup/bootstrapchannelbuilder/pruning.go

rifelpet avatar Dec 07 '23 13:12 rifelpet

/kind office-hours

rifelpet avatar Dec 07 '23 13:12 rifelpet

Decision from office hours:

  1. Upgrade to v0.31.3 (the last pre-beta version that can be rolled back to if the beta upgrade fails), cherrypick to kops 1.28
  2. Upgrade to latest karpenter, include both alpha and beta CRDs. Mention in kops 1.29 release notes for karpenter users to first upgrade to the kops 1.28 release that includes v0.31.3 before upgrading to kops 1.29

We canoptionally migrate our manifest template to use the new custom resources or delay this until a later kops release. Karpenter claims it supports both alpha and beta APIs for now but will drop alpha at some point in the future.

rifelpet avatar Dec 07 '23 15:12 rifelpet

Kops depends on using externally-managed LaunchTemplates in Karpenter's AWSNodeTemplate (v1alpha1) or EC2NodeClass (v1beta1):

https://github.com/kubernetes/kops/blob/62e2d5ac7a979c365796f52801a61034fe1e9cbf/upup/models/cloudup/resources/addons/karpenter.sh/k8s-1.19.yaml.template#L1799-L1807

This allows kops to manage the LaunchTemplates based on instance group definitions and provide those to Karpenter.

Support for externally-managed LaunchTemplates has been removed so we'll need to decide how to proceed.

@olemarkus you had strong opinions about this originally, any ideas?

rifelpet avatar Dec 10 '23 01:12 rifelpet

Has it actually been removed now? Sad since none of the limitations mentioned in the RFC applies to clusters maintained by installers such as kOps.

In order to support karpenter-managed launch templates we need to inject the kOps user data into the Karpenter CRs and then leave all of the node lifecycle up to Karpenter . I am guessing for example rolling updates need to ignore karpenter IGs and rather rely on karpenter's mechanisms for that.

olemarkus avatar Dec 10 '23 08:12 olemarkus

Has it actually been removed now? Sad since none of the limitations mentioned in the RFC applies to clusters maintained by installers such as kOps.

The docs mention that only v0.32.X supports both apiVersions and CRDs:

Having different Kind names for v1alpha5 and v1beta1 allows them to coexist for the same Karpenter controller for v0.32.x.

All alpha references have been removed in v0.33.0: https://github.com/kubernetes-sigs/karpenter/pull/840

In order to support karpenter-managed launch templates we need to inject the kOps user data into the Karpenter CRs and then leave all of the node lifecycle up to Karpenter . I am guessing for example rolling updates need to ignore karpenter IGs and rather rely on karpenter's mechanisms for that.

That makes sense 👍🏻

rifelpet avatar Dec 14 '23 02:12 rifelpet

Any news or ETA when this will be supported?

evs-ops avatar Jan 01 '24 10:01 evs-ops

Do we already have a definition from kops maintainers if eventually it will support karpenter v0.33.0+? Or is still under discussion if it's even possible / worth it to handle all the necessary changes needed after karpenter dropped the unmanaged launch templates option?

douglasquintanilha avatar Mar 09 '24 00:03 douglasquintanilha

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 07 '24 01:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 07 '24 01:07 k8s-triage-robot

/remove-lifecycle rotten

rifelpet avatar Jul 07 '24 11:07 rifelpet