containers-roadmap [EKS] [request]: Ability to configure pod-eviction-timeout

Tell us about your request I would like to be able to make changes to configuration values for things like kube-controller. This enables a greater customisation of the cluster to specific, bespoke needs. It will also go a long way in making the cluster more resilient and self-healing.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

At present, we have a cluster managed by EKS. The default pod-eviction-timeout is five minutes, meaning that we can derail an instance and the control plane won't reschedule for five minutes. Five minute outages for things like our payment systems is simply unacceptable - the cost impact would be severe. At present, to the best of my knowledge, the control plane is not configurable at all.

What we would like to be able to do is provide configuration parameters via the AWS API or within a Kubernetes resources like a ConfigMap. Either or would mean, when we bring up new EKS clusters, we can automate the configuration of values like pod-eviction-timeout.

Are you currently working around this issue? No, to the best of my knowledge, it isn't something that EKS presently supports.

Feb 10 '19 17:02 ChrisCooney

Thanks for submitting this Chris. At present, the 5 minute timeout is the default for Kubernetes. We’re evaluating adding additional configuration parameters onto the control plane and have added this to our list of parameters to research exposing for customization on a per-cluster basis.

Feb 15 '19 15:02 tabern

Hi @tabern , thanks for the response. Yes, I'm aware of the Kubernetes default. A large portion of those running K8s in production have actively tweaked these values and I worry this would be a barrier to EKS supporting some of our more critical applications.

Glad to hear this is being evaluated and look forward to seeing where it goes.

Feb 15 '19 18:02 ChrisCooney

@ChrisCooney sounds good. We're going to look into this. I've updated the title of your request to specifically address this ask so we can track it.

Feb 15 '19 23:02 tabern

To add another use case: We also wish to be able to adjust pod-eviction-timeout, specifically to facilitate the use of Spot Instances. In the case that an instance is terminated without the running Pods being properly evicted, we want a short timeout before those Pods are rescheduled elsewhere.

Thanks!

Feb 20 '19 12:02 BrianChristie

Ideally we should be also able to tune:

--node-monitor-period
--node-monitor-grace-period

Feb 20 '19 22:02 dawidmalina

I would also very much like to have control over HPA scaling delays since there's no other way to do it:

--horizontal-pod-autoscaler-downscale-delay
--horizontal-pod-autoscaler-upscale-delay

Apr 05 '19 20:04 geerlingguy

@BrianChristie BTW, if you like you can monitor for spot node terminator and evict the pods cleanly before termination.

Apr 06 '19 04:04 whereisaaron

also --horizontal-pod-autoscaler-cpu-initialization-period and --horizontal-pod-autoscaler-downscale-stabilization as if one of hour hpa is failing miserably a second one actually only scales within the CPU utilization but as they are limited and only can go up to almost twice the "wished" target, we only can scale up by 2 each run (https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) which means with 16 pods running we only grow to 32.. and then it takes 5mins before it scales to 64 and then another 5mins to 128.. if the other HPA which is failing at that time had 800 pods running and is dropping to 300, then it takes like ages to cover the missing 500 pods

Apr 17 '19 15:04 savar

Are there plans to allow passing in any amount of parameters from something like https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/ (specifically --terminated-pod-gc-threshold) or is the plan to only allow customizing certain parameters?

Aug 14 '19 15:08 echoboomer

Could also use the ability to modify

--horizontal-pod-autoscaler-use-rest-clients

Since I'm having problems with HPA and metrics-server and can't view or configure it

Aug 28 '19 11:08 eladitzhakian

Looks like more and more people adapting k8s on eks are in urgent need of these customizations. Specifically the one already mentioned,
--horizontal-pod-autoscaler-downscale-delay --horizontal-pod-autoscaler-upscale-delay and --pod-eviction-timeout

Unable to meet worker nodes patching requirements. (although draining helps a little, but not enough to comply)

Sep 05 '19 05:09 mebuzz

Actually 5 minute is sometimes too long to delete pods on failed nodes. --pod-eviction-timeout duration should be enabled on EKS too.

Sep 09 '19 15:09 ghost

I really need to set below one! --horizontal-pod-autoscaler-upscale-delay

Sep 18 '19 06:09 chillybug

Any updates? We're also looking for the ability to configure these values.

Nov 12 '19 17:11 gillbee

As an interim workaround, instead of using --pod-eviction-timeout, can you use Taint Based Evictions to set this on a per-pod basis? This is supported in EKS clusters running 1.13+.

There's an example in this issue: https://github.com/kubernetes/kubernetes/issues/74651

Nov 26 '19 13:11 PaulMaddox

Not sure if this works for everybody or everything but I recently noticed this in the AWS EKS node AMI:

https://github.com/awslabs/amazon-eks-ami/blob/master/files/kubelet.service#L14

Notice the use of $KUBELET_ARGS $KUBELET_EXTRA_ARGS here - we were able to pass in my original requirement of --terminated-pod-gc-threshold this way, but I'm not entirely certain that a) AWS honors things placed here or b) these work with master-node abstraction.

Nov 27 '19 00:11 echoboomer

Not sure if this works for everybody or everything but I recently noticed this in the AWS EKS node AMI:

https://github.com/awslabs/amazon-eks-ami/blob/master/files/kubelet.service#L14

Notice the use of $KUBELET_ARGS $KUBELET_EXTRA_ARGS here - we were able to pass in my original requirement of --terminated-pod-gc-threshold this way, but I'm not entirely certain that a) AWS honors things placed here or b) these work with master-node abstraction.

Yeah, this means you can configure the Kubelet on the node. Alas, it doesn't allow us to configure the kubernetes control plane.

Nov 27 '19 11:11 ChrisCooney

can you allow the ability to modify the below flags for the kube-controller-manager fo us to be able to manage the col down delay aside from the default 5 minutes: --horizontal-pod-autoscaler-downscale-delay --horizontal-pod-autoscaler-upscale-delay

Jan 03 '20 10:01 shivarajai

you could use this instead, https://blog.postmates.com/configurable-horizontal-pod-autoscaler-81f48779abfc

Mar 17 '20 20:03 jicowan

Add:

--terminated-pod-gc-threshold

Apr 13 '20 09:04 starchx

Jumping in to request that --horizontal-pod-autoscaler-initial-readiness-delay also be added. We are running an HPA in our EKS clusters and are unable to fully configure it how we would like.

I'm not sure why kube chose to have all of these HPA-related configs go on the controller manager instead of being configured on the HPA resource itself, but that's another story.

May 14 '20 20:05 calebwoofenden

Note that 1.18 adds support configurable scaling behavior

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-configurable-scaling-behavior

So this will be possible once EKS supports 1.18

May 27 '20 16:05 mikestef9

Still with 1.18 it doesn't seem to bite

error validating data: ValidationError(HorizontalPodAutoscaler.spec): unknown field "behavior" in io.k8s.api.autoscaling.v2beta1.HorizontalPodAutoscalerSpec;

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T18:49:28Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.8-eks-7c9bda", GitCommit:"7c9bda52c425d0d56d7b93f1377a826b4132c05c", GitTreeState:"clean", BuildDate:"2020-08-28T23:04:33Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Nov 23 '20 08:11 danijelk

@danijelk try v2beta2 for it.

Nov 23 '20 08:11 toricls

@toricls Ah, didn't see I was on beta1, k8s accepted it now thanks.

Nov 23 '20 09:11 danijelk

Is there a way to set the --terminated-pod-gc-threshold on the Kube-controller-manager with EKS? A solution was suggested earlier about specifying the parameters in the AMI. Is that a recommended way to do it for now? Although, that would mean having a custom AMI that needs to be updated every time there is a new AMI version for EKS.

Dec 30 '20 07:12 aniruddhch

Closing this as setting these flags is supported in K8s v1.18 and higher.

Mar 23 '21 20:03 tabern

@tabern, I understand the hpa.v2beta2 have ability to add behavior configuration, this resolve part of requests. However, i just curios that how can we set pod-eviction-timeout after k8s v1.18 without modifying kube-controller-manager ?

Apr 03 '21 06:04 jerry123je

need horizontal-pod-autoscaler-initial-readiness-delay flag to be configurable in eks, but thats not possible till now. any info on how to configure it for eks ?

Apr 20 '21 06:04 EdwinPhilip

Not sure why this ticket is closed and "Shipped"? How to set "pod-eviction-timeout" ???

May 03 '21 23:05 lmgnid

containers-roadmap containers-roadmap copied to clipboard

[EKS] [request]: Ability to configure pod-eviction-timeout

containers-roadmap
containers-roadmap copied to clipboard