cluster-api-provider-aws icon indicating copy to clipboard operation
cluster-api-provider-aws copied to clipboard

Add Karpenter support

Open Skarlso opened this issue 3 years ago • 17 comments
trafficstars

/kind feature

Describe the solution you'd like [A clear and concise description of what you want to happen.]

Add Karpenter support for node scaling.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

There are a couple of things to consider:

  • How do we install it?
    • Using Helm -> How do we keep maintaining it
    • Using download and install -> Requires a bunch of fine-tuning when installing which Helm takes care off
  • What version do we support?
    • Karpenter team guarantees no breaking changes between patch versions but since it's pre-alpha it can break between minor versions. Should we care? Or should we leave it up to the users?
  • Do we provide a basic provisioner configuration or we don't install anything? A basic configuration as of v0.13.0 looks like this:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: karpenter.sh/capacity-type         # optional, set to on-demand by default, spot if both are listed
      operator: In
      values: ["spot"]
  limits:
    resources:
      cpu: 1000                               # optional, recommended to limit total provisioned CPUs
      memory: 1000Gi
  providerRef:                                # optional, recommended to use instead of `provider`
    name: default
  ttlSecondsAfterEmpty: 30                    # optional, but never scales down if not set
  ttlSecondsUntilExpired: 2592000             # optional, but never expires if not set
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: default
spec:
  subnetSelector:                             # required
    karpenter.sh/discovery: ${CLUSTER_NAME}
  securityGroupSelector:                      # required, when not using launchTemplate
    karpenter.sh/discovery: ${CLUSTER_NAME}
  instanceProfile: MyInstanceProfile          # optional, if already set in controller args
  launchTemplate: MyLaunchTemplate            # optional, see Launch Template documentation
  tags:
    InternalAccountingTag: "1234"             # optional, add tags for your own use

Installing Karpenter could be easy, but it would require adding helm as a dependency ( using it as a library to install Karpenter ). This would be a new dependency in CAPA.

Environment:

  • Cluster-api-provider-aws version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

Skarlso avatar Jun 29 '22 07:06 Skarlso

/assign

Skarlso avatar Jun 29 '22 07:06 Skarlso

Oh yes please :+1:

/triage accepted /priority important-soon

richardcase avatar Jun 29 '22 07:06 richardcase

One other question, I'll add this to the description as well, is do we want to add support to define Provisioners by configuration? https://karpenter.sh/v0.13.1/aws/provisioning/

Or, we just don't care and people can configure Karpenter however they want or do we provide a basic configuration for a default AWS provider?

Skarlso avatar Jul 01 '22 12:07 Skarlso

Should we create Provisioner and AWSNodeTemplate to match machineset and machinedeployment from capa specs.

Do we need user to tags resources with karpenter.sh/discovery or CAPA will handle this

or Should we add this support on karpenter side for cluster-api itself like we have in cluster-autoscaler

sadysnaat avatar Jul 05 '22 12:07 sadysnaat

Good questions! The Provisioner template sounds good to me. I'm not sure about the tagging. I think CAPA can apply it, right?

Skarlso avatar Jul 09 '22 15:07 Skarlso

CAPA will be able to apply whatever tags are needed :smile:

richardcase avatar Jul 11 '22 06:07 richardcase

Fantastic. Let's do that. :)

Skarlso avatar Jul 11 '22 09:07 Skarlso

So the question remains... how do we wish to implement installing Karpenter? Do we use Helm to do the trick? That would probably be the most straightforward one instead of trying to download a release and setting up various things for it on our own which might change from release to release...

Skarlso avatar Jul 11 '22 19:07 Skarlso

Do you think that CAPA would do the install of karpenter? Or would we leave that up to the user?

richardcase avatar Jul 12 '22 09:07 richardcase

What could CAPA do if not installing and maintaining Karpenter? Like, what would the integration do?

I thought the integration meant that we also install Karpenter itself and take care if the user increases the version for it. But I might be mistaken in that regard or envisioned too much? :)

Otherwise, I guess CAPA could maintain the IAM resources, roles and tags and such, but that's not much to be honest, not?

Skarlso avatar Jul 12 '22 09:07 Skarlso

What could CAPA do if not installing and maintaining Karpenter? Like, what would the integration do?

Not sure to be honest, as I don't know much (if anything) about karpenter. I wondered if there would be some custom work required in CAPA to make this work. This discussion mentioned some things like annotations etc.

We won't know for sure until we try.

I don't think we should be installing things into the tenant cluster....unless it comes as an EKS addon. CAPA users have different ways to manage things that run the cluster. There is also CRS and the addons proposal which can provide ways to install stuff into a cluster once its been provisioned.

We could provide instructions such as a "getting started with CAPA & karpenter" guide.

richardcase avatar Jul 13 '22 13:07 richardcase

@richardcase Cool, that makes sense! :) I'll do some research and try out things, see how it works. :)

Skarlso avatar Jul 17 '22 08:07 Skarlso

Hang on, I think I misunderstood the integration with CAPA and Karpenter. But now I see. I thought CAPA should manage Karpenter from 0 and onwards. But now I understand that it should be aware that Karpenter exists on the cluster and work with it? Or to be more precise, what is the integration here depict, what kind of relationship are we envisioning? :)

I have to still keep my head around tenant and control plane clusters and not think like eksctl. :D

Skarlso avatar Jul 17 '22 08:07 Skarlso

Before anything else I think we need to understand and discuss the details for the most valuable way to integrate this component with AWS clusters managed by CAPI and it's primitives for compute scalable resources: machineSet, machineDeployment, MachinePool, Machines... and see it in action. This seems to be still an open question for the project https://github.com/aws/karpenter/blob/main/designs/aws-launch-templates-options.md#capi-integration https://github.com/aws/karpenter/blob/main/pkg/cloudprovider/types.go#L36-L55

I see the discussion of managing the lifecycle of the component as a second step, docs/manual install is probably just good enough for starters to get things going.

enxebre avatar Jul 18 '22 09:07 enxebre

@enxebre Ok, so what would be the flow? The user creates a cluster and installs Karpenter and once CAPA detects that Karpenter was installed it will manage its components? But what does management mean in this case? Bring the nodes under the CAPA umbrella if they are cycled into the cluster?

Skarlso avatar Jul 27 '22 19:07 Skarlso

Some discussion from the Karpenter folks: https://github.com/aws/karpenter-core/issues/747

Skarlso avatar Jul 27 '22 21:07 Skarlso

Basically, for now, we can try working with Karpenter, but there are a couple of things that the peeps are still working on to play nice with CAPI.

Let's wait a bit to see what they will come up with so we can start playing with it.

Skarlso avatar Jul 28 '22 20:07 Skarlso

This work should be done in CAPI not a CAPA specific thing that would then need to be deduped later.

@vincepri , @fabriziopandini , can we move this to the CAPI repo?

randomvariable avatar Sep 07 '22 14:09 randomvariable

Do you mean it's CAPI work because Karpenter is aiming to be a general autoscaler?

Skarlso avatar Sep 07 '22 15:09 Skarlso

@randomvariable - I agree it would be better if it worked with all providers...i.e. CAPI specific.

richardcase avatar Sep 08 '22 12:09 richardcase

Cool, I can open this issue in CAPI repo.

Skarlso avatar Sep 08 '22 12:09 Skarlso

Opened https://github.com/kubernetes-sigs/cluster-api/issues/7198 in CAPI.

Skarlso avatar Sep 08 '22 19:09 Skarlso

@Skarlso - for future reference, you can use the /transfer command to move issues to another repo :)

richardcase avatar Sep 16 '22 09:09 richardcase

Ah that's true! :D

Skarlso avatar Sep 16 '22 10:09 Skarlso

Closed in favour of the CAPI issue.

Skarlso avatar Nov 17 '22 16:11 Skarlso