autoscaler
autoscaler copied to clipboard
GKE chooses to scale up more expensive spot N2D pool instead of a cheaper spot T2D pool
Which component are you using?:
cluster-autoscaler
What version of the component are you using?:
I don't know, it's GKE-managed
What k8s version are you using (kubectl version)?:
$ kubectl version Client Version: v1.28.6 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.28.5-gke.1217000
What environment is this in?:
GKE
What did you expect to happen?:
I had 2 node pools, both of the spot instances, n2d-standard-32 and t2d-standard-32, and workloads that can run on both of them.
When a pool scale-up was needed I expected it would be done for the t2d pool as these instances are cheaper.
What happened instead?:
The more expensive n2d pool was scaled up.
How to reproduce it (as minimally and precisely as possible):
I haven't tried to reproduce it, I've just seen it in production.
Anything else we need to know?:
I have taken a glance at how the pricing is being compared by the autoscaler and it seems that it is doomed to often be mistaken this way because of the hard-coded, rarely-updated, non-region-specific prices.
Because the spot prices may change even monthly in GCP now and are highly region-specific, I don't think it's a good approach.
In my case, for us-west1, according to the code t2d spot is 140% more expensive than the n2d spot, while in reality is about 17% cheaper...
See this Google Sheet for the details.