karpenter
karpenter copied to clipboard
Choose smaller spots instead of big on-demand when possible
Version
Karpenter Version: v0.23.0
Kubernetes Version: v1.24.0
Expected Behavior
When there is a request for multiple pods on one big spot, and there is no spot capacity, instead of fallback to on-demand, split the request to smaller spots.
Actual Behavior
During a large scale-up, Karpenter received multiple requests for deploying 2 pods on each node that can fit 24xlarge.
There was InsufficientInstanceCapacity
of 24xlarge, and instead of splitting the request to 2 spots 12xlarge, it choose 24xlarge on-demand.
Steps to Reproduce the Problem
Deploy large scale provisioner of 2 pods that can fit 24xlarge or more, and exclude instances larger than 24xlarge from the provisioner.
Resource Specs and Logs
2023-02-27T17:56:16.918Z INFO controller.provisioner launching node with 2 pods requesting {"cpu":"91705m","memory":"178356Mi","pods":"13"} from types m6idn.24xlarge, m6i.24xlarge, m6in.24xlarge, m6a.24xlarge, m6id.24xlarge and 5 other(s) {"commit": "5a7faa0-dirty", "provisioner": "ds-infra-network-online-spot"}
2023-02-27T17:56:16.918Z INFO controller.provisioner launching node with 2 pods requesting {"cpu":"91705m","memory":"178356Mi","pods":"13"} from types m6idn.24xlarge, m6i.24xlarge, m6in.24xlarge, m6a.24xlarge, m6id.24xlarge and 5 other(s) {"commit": "5a7faa0-dirty", "provisioner": "ds-infra-network-online-spot"}
2023-02-27T17:56:21.248Z DEBUG controller.provisioner.cloudprovider removing offering from offerings {"commit": "5a7faa0-dirty", "provisioner": "ds-infra-network-online-spot", "unavailable-reason": "InsufficientInstanceCapacity", "instance-type": "m6a.24xlarge", "zone": "us-east-1e", "capacity-type": "on-demand", "unavailable-offerings-ttl": "3m0s"}
2023-02-27T17:56:21.248Z DEBUG controller.provisioner.cloudprovider removing offering from offerings {"commit": "5a7faa0-dirty", "provisioner": "ds-infra-network-online-spot", "unavailable-reason": "InsufficientInstanceCapacity", "instance-type": "m6a.24xlarge", "zone": "us-east-1e", "capacity-type": "on-demand", "unavailable-offerings-ttl": "3m0s"}
2023-02-27T17:56:21.522Z INFO controller.provisioner.cloudprovider launched new instance {"commit": "5a7faa0-dirty", "provisioner": "ds-infra-network-online-spot", "id": "i-063a1b84f7a1a9173", "hostname": "ip-10-207-10-112.ec2.internal", "instance-type": "m6i.24xlarge", "zone": "us-east-1b", "capacity-type": "on-demand"}
2023-02-27T17:56:21.559Z INFO controller.provisioner.cloudprovider launched new instance {"commit": "5a7faa0-dirty", "provisioner": "ds-infra-network-online-spot", "id": "i-0d2cc2d44e98f6872", "hostname": "ip-10-207-11-85.ec2.internal", "instance-type": "m6i.24xlarge", "zone": "us-east-1b", "capacity-type": "on-demand"}
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment