containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[EKS] [Feature]: Allow Kube Scheduler Customization

Open Kausheel opened this issue 4 years ago • 31 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request What do you want us to build?

It would be great if EKS allowed users to configure the Kube Scheduler parameters. This is a Control Plane component, so users don't have access to this by default. Exposing the Kube Scheduler configuration either via AWS APIs or via the KubeSchedulerConfiguration resource type would be a significant advantage for EKS users.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Use cases for this might include switching from equal Pod distribution to a binpacking approach, which optimizes cost effectiveness. There are many other Scheduler parameters which users might want to tweak themselves.

Are you currently working around this issue? Implementing custom Kube Schedulers. This is not ideal, since it requires operational overhead in maintaining and updating the custom Kube Scheduler. It may also require using tools like OPA to insert custom schedulerName fields into the target Pods, which is yet another burden on the user.

Thanks!

Kausheel avatar Aug 06 '21 07:08 Kausheel

Another use-case is to define Cluster-level default constraints for PodTopologySpread in scheduler. As per doc https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/#cluster-level-default-constraints

AWS should make it as default behaviour in EKS cluster.

apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
profiles:
  - pluginConfig:
      - name: PodTopologySpread
        args:
          defaultConstraints:
            - maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway
          defaultingType: List

ashishapy avatar Oct 21 '21 18:10 ashishapy

I would love to use this for enabling bin packing like explained here: https://kubernetes.io/docs/concepts/scheduling-eviction/resource-bin-packing/

stijndehaes avatar Jan 05 '23 13:01 stijndehaes

Upvote.

Trying to use EKS and achieve bin packing is hard without changing Scheduler Behavior to favor MostAllocated.

sherifabdlnaby avatar Feb 24 '23 19:02 sherifabdlnaby

Note that this feature is supported to some extent in Azure and is supported for the use case of Scheduler Scoring Strategy: MostAllocated in GKE by using the autoscaling profile (note this is an assumption on my part, GKE does not explicitly document what this setting does under the hood) . Adding this ability would help EKS users gain parity in that sense.

logyball avatar Feb 28 '23 18:02 logyball

I would be fine with having a setting like GKE has, this would solve my use case. It probably does not solve every use case out there, but I can understand if the AWS EKS team feels reluctant to allow changing the whole configuration.

stijndehaes avatar Mar 06 '23 11:03 stijndehaes

Imagine this, if this feature can be opened for all EKS users, that would save a lot of time for them. Let's assume it will take one week per person to workaround this via custom kube-scheduler, if there are 1000 users need this, it will cost 7000 days, that would be a whole life of one person.

boblee0717 avatar Mar 21 '23 08:03 boblee0717

With Kubernetes v1.24 the DefaultPodTopologySpread feagture graduated to GA https://github.com/kubernetes/kubernetes/pull/108278. Without this we have not way to use (resp. configure) it on EKS clusters.

alex-berger avatar Jul 10 '23 08:07 alex-berger

Same here. We need this feature to enable resource bin packing for cost saving https://kubernetes.io/docs/concepts/scheduling-eviction/resource-bin-packing/

AnhQKatalon avatar Jul 21 '23 05:07 AnhQKatalon

@AnhQKatalon, run scheduler yourself with needed settings + patch pods to use that scheduler with kyverno for example :) Could be done in couple hours.

Art3mK avatar Jul 21 '23 05:07 Art3mK

@AnhQKatalon, run scheduler yourself with needed settings + patch pods to use that scheduler with kyverno for example :) Could be done in couple hours.

Yeah, I am doing the workaround this way. Appreciate your help. But it should be great if EKS supports changing the scheduler configuration officially.

AnhQKatalon avatar Jul 21 '23 06:07 AnhQKatalon

As others mentioned, this is required to set default pod topology constraints on the cluster, as per: https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#cluster-level-default-constraints. There would be other uses cases, I am sure of it.

There are workarounds, of course, but this seems like a core thing to do, in order to make the life of EKS users easier. I thing this is a MUST.

babinos87 avatar Jul 26 '23 11:07 babinos87

This would be very helpful for the same reasons mentioned by other above:

  • set default topology spreads for all pods in one central place
  • tweak bin-packing by changing NodeResourcesFit

The suggestion of rolling your own Scheduler is not appealing because EKS might have bolted on their own tweaks/modifications to get the scheduler to work right in AWS and then we'd loose all of that. And then there's maintaining it. I get that modifying the EKS blessed set of configuration can lead to instability - but if I want to modify just a few settings I should be allowed to do that with the understanding it could break scheduling on my cluster. Upstream k8s allows it and it's useful.

fernandesnikhil avatar Sep 16 '23 16:09 fernandesnikhil

If not possible to add customization in kube-scheduler, can we think about this feature like GKE, node groups will have option to scale with the mostAllocated like strategy like GKE have autoscale profile optimize-utilization ?

subhranil05 avatar Sep 18 '23 06:09 subhranil05

@subhranil05 This is not an alternative solution. Scaling Node Groups can only achieve bin-packing during the event of scaling up. Kube Scheduler customization is necessary for in-place, proactive bin-packing.

sherifabdlnaby avatar Sep 18 '23 17:09 sherifabdlnaby

Can somebody take a look and consider including this issue to kanban board? It seems that demand is still valid in 2023 as issue is active for more than 2 years. Of course we we can self-manage additional kube-scheduler but it's counter intuitive to subscribe for aws-managed EKS controlplane with self-managed controlplane components (additional kube-scheduler).

CC @tabern @mikestef9

m00lecule avatar Oct 03 '23 14:10 m00lecule

This would be very useful for my EKS clusters. I want to be able to set sensible defaults without having to run my own scheduler.

paulchambers avatar Oct 13 '23 12:10 paulchambers

I would love to see this as well too support bin packing at scheduling.

cskinfill avatar Oct 17 '23 12:10 cskinfill

Do it for the environment folks!

sherifabdlnaby avatar Oct 18 '23 16:10 sherifabdlnaby

I want to use bin packing with karpenter for job workloads. So karpenter can scale down empty nodes after a scale up. Instead of spreading the pods across all nearly empty nodes they should be packed on some full nodes, to enable karpenter removing empty nodes after the last job running on it completed.

Legion2 avatar Dec 04 '23 18:12 Legion2

Assuming AWS may not prioritize this for awhile at the current rate, I think an example deployment of a custom scheduler with MostAllocated enabled for binpacking would benefit everyone here (as suggested in https://github.com/aws/containers-roadmap/issues/1468#issuecomment-1645021158) - despite the burden it puts on 1) cluster admins to maintain control plane infra in-step with EKS versions, 2) Pod creators to ensure the custom scheduler is used. A Kyverno / Gatekeeper / custom webhooks potentially helping with the latter.

https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/

Is a starting point, but if anyone has manifest samples that have been tested for a binpack configuration everyone wants that'd be appreciated. If I get to this at some point will share.

In some clusters, I've seen something like this provided:

apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: /var/lib/kube-scheduler/kubeconfig
profiles:
- schedulerName: default-scheduler
  pluginConfig:
    - args:
        scoringStrategy:
          type: MostAllocated
      name: NodeResourcesFit
  plugins:
    score:
      disabled:
      - name: "NodeResourcesBalancedAllocation"
      enabled:
      - name: "NodeResourcesFit"
        weight: 5

onelapahead avatar Dec 13 '23 03:12 onelapahead

We ran into this this same issue and had to setup a custom scheduler to implement bin-packing. It's the same kube-scheduler image with a MostAllocated scoring policy as suggested above. Blog has more details about how we dealt with overprovisioning and system workloads and rollout to all pods. This section has the specific scheduler config.

We were able to achieve this in GCP by using the optimize-utilization setting in GKE, but for Azure AKS, we still have to use this secondary scheduler with custom scoring policy.

vinay92-ch avatar Feb 12 '24 22:02 vinay92-ch

How is this API not supported yet? Is there any plan to support this soon? It's part of the standard Kubernetes service but there's no way to use on EKS? This really doesn't make EKS very usable in our case. All of the major packages are assuming that the standard APIs are available.

MattLJoslin avatar Mar 13 '24 23:03 MattLJoslin

Same as @MattLJoslin said... we really need it as well

eliran-zada-zesty avatar Apr 17 '24 12:04 eliran-zada-zesty

I think being able to run the scheduler in MostAllocated mode would make the Karpenter use case even more compelling.

stevehipwell avatar May 02 '24 13:05 stevehipwell

https://www.cncf.io/blog/2024/06/03/tackling-gpu-underutilization-in-kubernetes-runtimes/

stevehipwell avatar Jun 18 '24 15:06 stevehipwell