k8s.io icon indicating copy to clipboard operation
k8s.io copied to clipboard

N2 Quota changes for Scale Projects

Open upodroid opened this issue 1 year ago • 4 comments

Kubernetes project uses E2 instances on GCP by default unless we are testing something that requires specific instance types(GPU tests, scale perf testing, arm64)

k/k change: https://github.com/kubernetes/kubernetes/pull/118626

With E2, the VMs issued by Google can run on modern AMD Epyc or ancient Intel Skylake hosts. However, scale job control plane nodes need to run on high-performance instances consistently and will be using N2 machine types with Ice Lake CPUs.

However, N2 quotas are not set properly and this issue will track quota failures from k8s-infra-e2e-scale-project-XX and fix them as reported.

  • ~k8s-infra-e2e-scale-03 https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-scalability/1745631298405273600~
  • ~k8s-infra-e2e-scale-04 https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-scalability/1745631298405273600~
  • ~k8s-infra-e2e-scale-01 https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-scalability/1745623497192771584~
  • ~k8s-infra-e2e-scale-05~
  • ~k8s-infra-e2e-scale-02~

Quotas for E2 CPUs will be bumped to 1000 in us-east1. Please ensure that jobs are running in this location

/sig testing /sig scalability /priority critical-urgent

upodroid avatar Jan 12 '24 11:01 upodroid

Related failure in PR: https://github.com/kubernetes/perf-tests/pull/2494 Example run: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/perf-tests/2494/pull-perf-tests-clusterloader2/1744323040390418432 Project: k8s-presubmit-scale-36

 - Quota 'N2_CPUS' exceeded.  Limit: 24.0 in region us-east1.
	metric name = compute.googleapis.com/n2_cpus
	limit name = N2-CPUS-per-project-region
	limit = 24.0
	dimensions = region: us-east1

marseel avatar Jan 12 '24 11:01 marseel

projects starting with k8s-* are part of the google.com org that we don't manage. Please migrate those projects to the community infrastructure.

upodroid avatar Jan 12 '24 11:01 upodroid

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 11 '24 14:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar May 11 '24 14:05 k8s-triage-robot