k8s.io icon indicating copy to clipboard operation
k8s.io copied to clipboard

Change instance type for GKE build clusters

Open ameukam opened this issue 3 years ago • 10 comments

Prow build clusters currently use instances of the n1 machine family. We could potentially migrate to the:

  • e2 family: https://cloud.google.com/compute/docs/general-purpose-machines#e2_machine_types
    • Cons: no support of local SSD.
  • n2 family: https://cloud.google.com/compute/docs/general-purpose-machines#n2d_machines
  • n2d family: https://cloud.google.com/compute/docs/general-purpose-machines#n2d_machines
  • t2d family: https://cloud.google.com/tau-vm
    • Cons: no support of local SSD.

This is purely a financially suggestion. Change to machine type family will help in cost savings of resource consumption.

/milestone v1.24

ameukam avatar Jul 30 '21 22:07 ameukam

I very much want to try out local SSD before we decide whether the e2 family makes sense.

My quick glance is N2's run us more than N1's, I'd be curious to understand what benefit we think we'd be getting with N2D's

spiffxp avatar Aug 06 '21 15:08 spiffxp

I very much want to try out local SSD before we decide whether the e2 family makes sense.

My quick glance is N2's run us more than N1's, I'd be curious to understand what benefit we think we'd be getting with N2D's

N2D (specially the n2d-highmem-8) have memory than the n1 instance. This could help increase individual node density and reduce cluster size. But this is pure speculation. I need to get my calculations. I also want to explore all possible options to improve capacity planning.

ameukam avatar Aug 06 '21 15:08 ameukam

/priority important-longterm

spiffxp avatar Sep 02 '21 19:09 spiffxp

An update from https://github.com/kubernetes/k8s.io/issues/1187. We are now running instances with local SSDs for ephemeral storage. It's not clear whether they've substantively improved our build performance, but since e2's don't support local SSD, I would likely not consider e2's unless there's a really compelling reason to.

spiffxp avatar Oct 01 '21 18:10 spiffxp

After a quick review of the GCP compute pricing, change the instances type will be the opposite of what I expect. I still believe the pricing will evolve over the next years and allow us to make the change.

/remove-milestone /priority backlog /lifecycle frozen

ameukam avatar Nov 03 '21 08:11 ameukam

/milestone clear

ameukam avatar Nov 03 '21 08:11 ameukam

/milestone v1.26 /remove-lifecycle frozen /remove-priority backlog

ameukam avatar Aug 26 '22 15:08 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 24 '22 16:11 k8s-triage-robot

/remove-lifecycle stale /milestone clear /lifecycle frozen

ameukam avatar Nov 24 '22 19:11 ameukam

We will use the N2 family once we are done with the migration prow.k8s.io

ameukam avatar Mar 03 '24 14:03 ameukam