karpenter
karpenter copied to clipboard
Capacity Type Distribution
Some application deployments would like the benefits of using Spot capacity, but would also like somewhat of a stability guarantee for the application. I propose a capacity-type percentage distribution of the k8s Deployment resource. Since Capacity-Type is likely to be implemented at the cloud-provider level, this too would need to be at the cloud-provider layer.
For example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 10
template:
metadata:
labels:
app: inflate
node.k8s.aws/capacity-type-distribution/spot-percentage: 90
spec:
containers:
- image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
name: inflate
resources:
requests:
cpu: "100m"
The above deployment spec would result in the deployment controller creating Pods for the 10 replicas. Karpenter would register a mutating admission webhook which would check if the pod's deployment spec has these labels and then check any current pods belonging to the deployment to determine which capacity-type label to apply. The pod resource after the admission webhook would look like this:
apiVersion: v1
kind: Pod
metadata:
labels:
app: inflate
pod-template-hash: 8567cd588
name: inflate-8567cd588-bjqzf
namespace: default
ownerReferences:
- apiVersion: apps/v1
kind: ReplicaSet
name: inflate-8567cd588
spec:
containers:
- image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
name: inflate
resources:
requests:
cpu: "100m"
schedulerName: default-scheduler
nodeSelector:
node.k8s.aws/capacity-type: spot
^^ duplicated 8 more times, and then:
apiVersion: v1
kind: Pod
metadata:
labels:
app: inflate
pod-template-hash: 4567dc765
name: inflate-4567dc765-asdf
namespace: default
ownerReferences:
- apiVersion: apps/v1
kind: ReplicaSet
name: inflate-4567dc765
spec:
containers:
- image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
name: inflate
resources:
requests:
cpu: "100m"
schedulerName: default-scheduler
nodeSelector:
node.k8s.aws/capacity-type: on-demand
My first thought is that this webhook should be decoupled from Karpenter's core controller. Maybe something plugged into aws cloud provider once we break it apart?
My first thought is that this webhook should be decoupled from Karpenter's core controller. Maybe something plugged into aws cloud provider once we break it apart?
Yeah, I think that makes the most sense.
We have one application that has been testing out using ASGs with mixed purchasing options(on-demand and spot). This one application is using this: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-mixed-instances-groups.html
In the future it would be great to find a mechanism to migrate this ASG to karpenter.
Two (maybe random) questions.
Should node.k8s.aws/capacity-type-distribution/spot-percentage be an annotation instead of a label?
Now that we'd have 2 separate deployments how would that work with HPA? How would it know which deployment to scale and keep the total application deployment balanced?
+1
We may also think how to leverage the current topology constraints instead of new annotations
We've discussed expanding the topologyspreadconstraints concept to include percent based spread. I think this is a perfect fit.
+1
Any updates please
+1
Hey folks, just a reminder to 👍 the original issue, rather than +1 in the comments, since it's easier for us to sort issues by most upvoted.
Any update please.
I've documented another method for achieving something similar at https://karpenter.sh/preview/tasks/scheduling/#on-demandspot-ratio-split that may work for some.
👍
+1
👍
@tzneal This is the correct link for on-demand x spot ratio split https://karpenter.sh/preview/concepts/scheduling/#on-demandspot-ratio-split
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten