Add an example for different budget for different disruption reasons
Issue #, if available:
Description of changes:
- Adding an example of different disruption budget for different disruption reasons.
I came across this after upgrading in v1 and it's quite useful as we needed to keep the disruption quite strict to limit blast radius of situation like AMI update/EKS upgrade, and we noticed that it affected the consolidation activity.
with this now we are allowed to consolidate more efficiently while keeping the strict policy for update.
Let me know if it makes sense or if it's not that useful feel free to close it.
Thank you
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Thank you for submitting this contribution! Its great to hear how these blueprints / Karpenter v1 can a help solve real world challenges.
For users to fully benefit from your example could you share how this was tested and how we could replicate this. Although we are adding this as an example to the README.md, it would be great for users confidence to know how to validate the configuration. Thanks again.
Thank you for submitting this contribution! Its great to hear how these blueprints / Karpenter v1 can a help solve real world challenges.
For users to fully benefit from your example could you share how this was tested and how we could replicate this. Although we are adding this as an example to the README.md, it would be great for users confidence to know how to validate the configuration. Thanks again.
I have recently applied this new nodepool configuration in production (after finishing v1 upgrade).
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: multiple-consolidations
spec:
disruption:
budgets:
- nodes: "1"
reasons:
- Drifted
- duration: 14m0s
nodes: "0"
reasons:
- Drifted
schedule: '*/15 * * * *'
- nodes: "3"
reasons:
- Empty
- Underutilized
- duration: 9m0s
nodes: "0"
reasons:
- Empty
- Underutilized
schedule: '*/10 * * * *'
consolidateAfter: 5m0s
consolidationPolicy: WhenEmptyOrUnderutilized
template:
spec:
expireAfter: 720h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: karpenter.k8s.aws/instance-memory
operator: Gt
values:
- "16000"
- key: karpenter.k8s.aws/instance-category
operator: In
values:
- r
- m
- c
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values:
- "4"
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: kubernetes.io/arch
operator: In
values:
- amd64
- arm64
- key: topology.kubernetes.io/zone
operator: In
values:
- eu-west-1a
- eu-west-1b
- eu-west-1c
- key: kubernetes.io/os
operator: In
values:
- linux
weight: 1
(redacted some details such as taint, and selector)
This is the metrics showing that it behaves as needed. the metric being used is karpenter_nodepools_allowed_disruptions
green line is drifted reason which is 1 node every 15 minutes (start acting for example minutes 44, 59)
read and blue represents underutilised and empty which is 3 every 10 minutes starting for example 39, 49, 59
Let me know if you need me to share these in the doc somehow or anything needed.
Thank you
@InsomniaCoder first of all, THANK YOU so much for not just letting us know these blueprints have been useful for you, but also for making contributions as well, you rock! I have a few recommendations about this:
- Can you please make this part of the "Multipe Budgets" section, and move the "Multiple Budgets" section after the "Reasons" section? That way we can keep a consistent order and going deeper every time.
- As Jake suggested (and you already answered), it would be really helpful if you can incorporate what you described here, specially to show the results others will see by having this configuration in place.
- Can you please break down each budget, you're already doing it partially but it was a bit hard for me to follow along. Maybe you can explain the four scenarios, then show the NodePool config, and then the results.
- Can you also either add a note or directly make it explicit that the budget config will "in a given time frame, at most x nodes can be disrupting at a given moment".
- Let's see how long it ends up being this blueprint, maybe it will be worth it to actually have a dedicated blueprint for this and tested (following Jake's recommendation).
We think this contributions the blueprint will end up being even more awesome :)
@InsomniaCoder hello Tanat, I was just wondering if this is still in your radar?
Hi @chrismld I need to apologize, this has been out of my context for some time. I will go ahead and close it and will look into this later when I regain the context.
Thank you so much!
thank you! and no problem, looking forward to hearing back from you soon :)