cdk-eks-blueprints icon indicating copy to clipboard operation
cdk-eks-blueprints copied to clipboard

karpenter addon (v1.0.2): has misconfigured CRDs and installCRDs: false, doesn't work.

Open neoakris opened this issue 1 year ago • 1 comments

Describe the bug

After seeing a helm chart release v1.0.2 on karpenter's upstream helmchart https://github.com/aws/karpenter-provider-aws/releases

I tried the following, which failed and had 2 major bugs

new blueprints.addons.EksPodIdentityAgentAddOn()
//^-- my karpenter config depends on this, I also deplied this

new blueprints.addons.KarpenterAddOn({
    version: "1.0.2", //https://github.com/aws/karpenter-provider-aws/releases
    installCRDs: false, //temporarily needed for v1.0.2
    ec2NodeClassSpec: {
        amiFamily: "Bottlerocket",
        subnetSelectorTerms: [{ tags: { "Name": `${config.id}/${config.id}-vpc/PrivateSubnet*` } }],
        securityGroupSelectorTerms: [{ tags: { "aws:eks:cluster-name": `${config.id}` } }],
        detailedMonitoring: false,
        tags: config.tags,
    },
    nodePoolSpec: {
        requirements: [
            { key: 'topology.kubernetes.io/zone', operator: 'In', 
              values: [
                  `${config.vpc.availabilityZones[0]}`,
                  `${config.vpc.availabilityZones[1]}`,
                  `${config.vpc.availabilityZones[2]}`] },
            { key: 'kubernetes.io/arch', operator: 'In', values: ['amd64','arm64']},
            { key: 'karpenter.sh/capacity-type', operator: 'In', values: ['spot']}, //spot for lower-envs
        ],
        disruption: {           //WhenUnderutilized is more agressive cost savings / slightly worse stability
            consolidationPolicy: "WhenUnderutilized", 
            //consolidateAfter: "30s", //<--not compatible with WhenUnderutilized
            expireAfter: "20m",
            budgets: [{nodes: "10%"}] 
        }
    },
    interruptionHandling: true,
    podIdentity: true,
    values: { //https://github.com/aws/karpenter-provider-aws/tree/main/charts/karpenter#values
        replicas: 1,
    }
})

Expected Behavior

  • karpenter to work
    • (and AWS to offer better support for products they founded karpenter and eksblueprints for cdk are both founded by AWS rather than random members of the open source community. Karpenter's been 1.0.0 for a while now, it's surprising that this is still an issue.)

Current Behavior

What I originally tried (shown above) resulted in the following error

Error from server: error when creating "/tmp/manifest.yaml"
conversion webhook for karpenter.sh/v1beta1, Kind=NodePool failed: 
Post: "https://karpenter.kube-system.svc:8443/conversaion/karpenter.sh?=timeout=30s" service
karpenter not found.

To get cdk to at least allow me to successfully deploy my eks blueprints based stack, so I could debug it further, I simplified it to the following. After which I was able to at least deploy it, and investigate how it was broken:

new blueprints.addons.KarpenterAddOn({
    version: "1.0.2", //https://github.com/aws/karpenter-provider-aws/releases
    installCRDs: false, //temporarily needed for v1.0.2
    interruptionHandling: true,
    podIdentity: true,
    values: { //https://github.com/aws/karpenter-provider-aws/tree/main/charts/karpenter#values
        replicas: 1,
    }
})

There's 2 problems / bugs with the above:

  • 1st bug: It's creating a misconfigured CRD
    • Notice the error message mentions karpenter.kube-system.svc That tells me it's looking for karpenter installed in kube-system namespace, while this add-on installs karpenter in the karpenter namespace.
    • I ran kubectl get crd ec2nodeclasses.karpenter.k8s.aws -o yaml and saw the following relevant snippet of yaml, which tells me the generated CRD is generated incorrectly.
      spec:
        conversion:
          strategy: Webhook
          webhook:
            clientConfig:
              caBundle: LS0tLS1CRUdJTiBDRVJU...
              service:
                name: karpenter
                namespace: kube-system
                path: /conversion/karpenter.k8s.aws
                port: 8443
      
  • 2nd bug: installCRDs: false, was ignored
    • I wanted to attempt to work around the problem by telling the addon not to generate the broken crd, so I could implement a workaround fix, but this setting wasn't respected.

Reproduction Steps

  1. install an eks blueprints based cluster
  2. install any dependency addons (like pod identity agent) new blueprints.addons.EksPodIdentityAgentAddOn()
  3. install karpenter addon with a config like this
new blueprints.addons.KarpenterAddOn({
    version: "1.0.2", //https://github.com/aws/karpenter-provider-aws/releases
    installCRDs: false, //temporarily needed for v1.0.2
    interruptionHandling: true,
    podIdentity: true,
    values: { //https://github.com/aws/karpenter-provider-aws/tree/main/charts/karpenter#values
        replicas: 1,
    }
})

Possible Solution

This is a complicated issue, and may need to be fixed upstream. I'd recommend fixing it in phases / stages.

It'd be great if a fix for installCRDs: false could be prioritized, and I think that part is an eks blueprints specific bug/in scope of an issue that makes since to fix in this repo. (unless it is the upstream helm chart that's installing the crd?)
If that part were prioritized then manual workarounds would be easier to implement.

Additional Information/Context

Here's the upstream repos if it helps:

  • https://github.com/aws-quickstart/cdk-eks-blueprints/blob/main/lib/addons/karpenter/index.ts
  • https://github.com/aws/karpenter-provider-aws/tree/main/charts/karpenter#values
  • Maybe part of the issue is this upstream reference which seems to incorrectly mention v1beta1 CRDs in the v1 version of the helm cart, which should mention CRDs with v1 references. https://github.com/aws/karpenter-provider-aws/blob/v1.0.2/charts/karpenter/Chart.yaml
  • This upstream issue might be related as well
    https://github.com/aws/karpenter-provider-aws/issues/6982

CDK CLI Version

2.133.0 (build dcc1e75)

EKS Blueprints Version

1.15.1

Node.js Version

v20.17.0

Environment details (OS name and version, etc.)

Mac OS Sonoma 14.6.1

Other information

No response

neoakris avatar Sep 20 '24 20:09 neoakris

Observation: karpenter addon with version 1.1.0 (which refers to helm chart https://github.com/aws/karpenter-provider-aws/releases) installs against kube 1.31, only if no values are specified for ec2nodeclassspec & nodepooolspec.

(if you specify values for ec2nodeclassspec & nodepooolspec then karpenter addon with version 1.1.0 of helm chart will fail, I'm guessing because the objects get generated with the older api version)

neoakris avatar Dec 03 '24 03:12 neoakris