containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[EKS] [request]: use resource capacity not ec2 family instance families

Open FernandoMiguel opened this issue 1 year ago • 0 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request Creation of a new ec2 instance family for EKS that would not be bound to tradicional families with pre-set sizes.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Fargate already lets practitioners chose VMs based on the Pods resource needs. But for EKS, the underlying instances are still based on tradicional EC2 instance families. With the extra abstraction of using things like Karpenter , practitioners and developer teams only care about their Pods resources, and let karpenter binpack and chose the cheapest option available between spot and on-demand.

So if AWS was to provide fully dynamic instances that would fit just right the resources that kube-schedule is looking to schedule for new pods, us clients would be paying just for the memory/cpu requested, just like fargate. Since Karpenter does consolidation for us, if Pods were decommissioned, those instances would be replaced with newer ones perfect size fit for the remaining Pods left running.

Say for example you schedule 4 replicas requesting 8GiBs of RAM. Karpenter would for example pick a r6a.2xlarge node (4x8GiBs plus some deamonsets and OS overhead). But why would practitioners even care about those instance families? All we care is that we need X amount of CPU and 40ish GiBs of RAM.

This new generation of VMs could be powered by Firecracker like fargate is, or tradicional AL2/bottlerocket, where it still provides Root level access for those that need it.

I do understand that this more dynamic VM sizing would make it harder for AWS to better utilize their hosts, harder to bin pack VMs, etc. But in the large numbers, this would eventually negate the issue.

Are you currently working around this issue? How are you currently solving this problem?

Additional context Anything else we should know?

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

FernandoMiguel avatar Sep 19 '22 11:09 FernandoMiguel