autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

GKE chooses to scale up more expensive spot N2D pool instead of a cheaper spot T2D pool

Open gdubicki opened this issue 1 year ago • 3 comments

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

I don't know, it's GKE-managed

What k8s version are you using (kubectl version)?:

$ kubectl version
Client Version: v1.28.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.5-gke.1217000

What environment is this in?:

GKE

What did you expect to happen?:

I had 2 node pools, both of the spot instances, n2d-standard-32 and t2d-standard-32, and workloads that can run on both of them.

When a pool scale-up was needed I expected it would be done for the t2d pool as these instances are cheaper.

What happened instead?:

The more expensive n2d pool was scaled up.

How to reproduce it (as minimally and precisely as possible):

I haven't tried to reproduce it, I've just seen it in production.

Anything else we need to know?:

I have taken a glance at how the pricing is being compared by the autoscaler and it seems that it is doomed to often be mistaken this way because of the hard-coded, rarely-updated, non-region-specific prices.

Because the spot prices may change even monthly in GCP now and are highly region-specific, I don't think it's a good approach.

In my case, for us-west1, according to the code t2d spot is 140% more expensive than the n2d spot, while in reality is about 17% cheaper...

See this Google Sheet for the details.

gdubicki avatar Feb 17 '24 11:02 gdubicki