k8s.io Change instance type for GKE build clusters

Prow build clusters currently use instances of the n1 machine family. We could potentially migrate to the:

e2 family: https://cloud.google.com/compute/docs/general-purpose-machines#e2_machine_types
- Cons: no support of local SSD.
n2 family: https://cloud.google.com/compute/docs/general-purpose-machines#n2d_machines
n2d family: https://cloud.google.com/compute/docs/general-purpose-machines#n2d_machines
t2d family: https://cloud.google.com/tau-vm
- Cons: no support of local SSD.

This is purely a financially suggestion. Change to machine type family will help in cost savings of resource consumption.

/milestone v1.24

Jul 30 '21 22:07 ameukam

I very much want to try out local SSD before we decide whether the e2 family makes sense.

My quick glance is N2's run us more than N1's, I'd be curious to understand what benefit we think we'd be getting with N2D's

Aug 06 '21 15:08 spiffxp

I very much want to try out local SSD before we decide whether the e2 family makes sense.

My quick glance is N2's run us more than N1's, I'd be curious to understand what benefit we think we'd be getting with N2D's

N2D (specially the n2d-highmem-8) have memory than the n1 instance. This could help increase individual node density and reduce cluster size. But this is pure speculation. I need to get my calculations. I also want to explore all possible options to improve capacity planning.

Aug 06 '21 15:08 ameukam

/priority important-longterm

Sep 02 '21 19:09 spiffxp

An update from https://github.com/kubernetes/k8s.io/issues/1187. We are now running instances with local SSDs for ephemeral storage. It's not clear whether they've substantively improved our build performance, but since e2's don't support local SSD, I would likely not consider e2's unless there's a really compelling reason to.

Oct 01 '21 18:10 spiffxp

After a quick review of the GCP compute pricing, change the instances type will be the opposite of what I expect. I still believe the pricing will evolve over the next years and allow us to make the change.

/remove-milestone /priority backlog /lifecycle frozen

Nov 03 '21 08:11 ameukam

/milestone clear

Nov 03 '21 08:11 ameukam

/milestone v1.26 /remove-lifecycle frozen /remove-priority backlog

Aug 26 '22 15:08 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 24 '22 16:11 k8s-triage-robot

/remove-lifecycle stale /milestone clear /lifecycle frozen

Nov 24 '22 19:11 ameukam

We will use the N2 family once we are done with the migration prow.k8s.io

Mar 03 '24 14:03 ameukam

k8s.io k8s.io copied to clipboard

Change instance type for GKE build clusters

k8s.io
k8s.io copied to clipboard