karpenter addon (v1.0.2): has misconfigured CRDs and installCRDs: false, doesn't work.
Describe the bug
After seeing a helm chart release v1.0.2 on karpenter's upstream helmchart https://github.com/aws/karpenter-provider-aws/releases
I tried the following, which failed and had 2 major bugs
new blueprints.addons.EksPodIdentityAgentAddOn()
//^-- my karpenter config depends on this, I also deplied this
new blueprints.addons.KarpenterAddOn({
version: "1.0.2", //https://github.com/aws/karpenter-provider-aws/releases
installCRDs: false, //temporarily needed for v1.0.2
ec2NodeClassSpec: {
amiFamily: "Bottlerocket",
subnetSelectorTerms: [{ tags: { "Name": `${config.id}/${config.id}-vpc/PrivateSubnet*` } }],
securityGroupSelectorTerms: [{ tags: { "aws:eks:cluster-name": `${config.id}` } }],
detailedMonitoring: false,
tags: config.tags,
},
nodePoolSpec: {
requirements: [
{ key: 'topology.kubernetes.io/zone', operator: 'In',
values: [
`${config.vpc.availabilityZones[0]}`,
`${config.vpc.availabilityZones[1]}`,
`${config.vpc.availabilityZones[2]}`] },
{ key: 'kubernetes.io/arch', operator: 'In', values: ['amd64','arm64']},
{ key: 'karpenter.sh/capacity-type', operator: 'In', values: ['spot']}, //spot for lower-envs
],
disruption: { //WhenUnderutilized is more agressive cost savings / slightly worse stability
consolidationPolicy: "WhenUnderutilized",
//consolidateAfter: "30s", //<--not compatible with WhenUnderutilized
expireAfter: "20m",
budgets: [{nodes: "10%"}]
}
},
interruptionHandling: true,
podIdentity: true,
values: { //https://github.com/aws/karpenter-provider-aws/tree/main/charts/karpenter#values
replicas: 1,
}
})
Expected Behavior
- karpenter to work
- (and AWS to offer better support for products they founded karpenter and eksblueprints for cdk are both founded by AWS rather than random members of the open source community. Karpenter's been 1.0.0 for a while now, it's surprising that this is still an issue.)
Current Behavior
What I originally tried (shown above) resulted in the following error
Error from server: error when creating "/tmp/manifest.yaml"
conversion webhook for karpenter.sh/v1beta1, Kind=NodePool failed:
Post: "https://karpenter.kube-system.svc:8443/conversaion/karpenter.sh?=timeout=30s" service
karpenter not found.
To get cdk to at least allow me to successfully deploy my eks blueprints based stack, so I could debug it further, I simplified it to the following. After which I was able to at least deploy it, and investigate how it was broken:
new blueprints.addons.KarpenterAddOn({
version: "1.0.2", //https://github.com/aws/karpenter-provider-aws/releases
installCRDs: false, //temporarily needed for v1.0.2
interruptionHandling: true,
podIdentity: true,
values: { //https://github.com/aws/karpenter-provider-aws/tree/main/charts/karpenter#values
replicas: 1,
}
})
There's 2 problems / bugs with the above:
- 1st bug: It's creating a misconfigured CRD
- Notice the error message mentions
karpenter.kube-system.svcThat tells me it's looking for karpenter installed in kube-system namespace, while this add-on installs karpenter in the karpenter namespace. - I ran
kubectl get crd ec2nodeclasses.karpenter.k8s.aws -o yamland saw the following relevant snippet of yaml, which tells me the generated CRD is generated incorrectly.spec: conversion: strategy: Webhook webhook: clientConfig: caBundle: LS0tLS1CRUdJTiBDRVJU... service: name: karpenter namespace: kube-system path: /conversion/karpenter.k8s.aws port: 8443
- Notice the error message mentions
- 2nd bug:
installCRDs: false, was ignored- I wanted to attempt to work around the problem by telling the addon not to generate the broken crd, so I could implement a workaround fix, but this setting wasn't respected.
Reproduction Steps
- install an eks blueprints based cluster
- install any dependency addons (like pod identity agent)
new blueprints.addons.EksPodIdentityAgentAddOn() - install karpenter addon with a config like this
new blueprints.addons.KarpenterAddOn({
version: "1.0.2", //https://github.com/aws/karpenter-provider-aws/releases
installCRDs: false, //temporarily needed for v1.0.2
interruptionHandling: true,
podIdentity: true,
values: { //https://github.com/aws/karpenter-provider-aws/tree/main/charts/karpenter#values
replicas: 1,
}
})
Possible Solution
This is a complicated issue, and may need to be fixed upstream. I'd recommend fixing it in phases / stages.
It'd be great if a fix for installCRDs: false could be prioritized, and I think that part is an eks blueprints specific bug/in scope of an issue that makes since to fix in this repo. (unless it is the upstream helm chart that's installing the crd?)
If that part were prioritized then manual workarounds would be easier to implement.
Additional Information/Context
Here's the upstream repos if it helps:
- https://github.com/aws-quickstart/cdk-eks-blueprints/blob/main/lib/addons/karpenter/index.ts
- https://github.com/aws/karpenter-provider-aws/tree/main/charts/karpenter#values
- Maybe part of the issue is this upstream reference which seems to incorrectly mention v1beta1 CRDs in the v1 version of the helm cart, which should mention CRDs with v1 references. https://github.com/aws/karpenter-provider-aws/blob/v1.0.2/charts/karpenter/Chart.yaml
- This upstream issue might be related as well
https://github.com/aws/karpenter-provider-aws/issues/6982
CDK CLI Version
2.133.0 (build dcc1e75)
EKS Blueprints Version
1.15.1
Node.js Version
v20.17.0
Environment details (OS name and version, etc.)
Mac OS Sonoma 14.6.1
Other information
No response
Observation: karpenter addon with version 1.1.0 (which refers to helm chart https://github.com/aws/karpenter-provider-aws/releases) installs against kube 1.31, only if no values are specified for ec2nodeclassspec & nodepooolspec.
(if you specify values for ec2nodeclassspec & nodepooolspec then karpenter addon with version 1.1.0 of helm chart will fail, I'm guessing because the objects get generated with the older api version)