k8s.io icon indicating copy to clipboard operation
k8s.io copied to clipboard

Migrate kettle to k8s-infra

Open spiffxp opened this issue 4 years ago • 30 comments

Part of migrating away from gcp-project k8s-gubernator: https://github.com/kubernetes/k8s.io/issues/1308

  • origin gcp-project: k8s-gubernator
  • origin cluster-name: g8r
  • apps:
    • [ ] kettle
  • repo: https://github.com/kubernetes/test-infra/tree/master/kettle

My suggestions for target:

  • project: kubernetes-public
  • cluster: aaa
  • namespace: kettle

/wg k8s-infra /area cluster-infra /sig testing

spiffxp avatar Apr 23 '20 01:04 spiffxp

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jul 22 '20 01:07 fejta-bot

/remove-lifecycle stale Kettle is still running there

spiffxp avatar Jul 30 '20 23:07 spiffxp

Migrating kettle most likely looks something like

  • migrate the bigquery database kettle writes to (https://github.com/kubernetes/k8s.io/issues/1307)
  • make sure the permissions kettle needs against google.com resources work for non-google.com accounts (e.g. pubsub permissions in kubernetes-jenkins)
  • migrate kettle

spiffxp avatar Jul 30 '20 23:07 spiffxp

FYI @MushuEE given that you've been modifying kettle lately, if you happen see things that could help inform a plan for this, drop 'em here

spiffxp avatar Jul 30 '20 23:07 spiffxp

When you say

migrate the bigquery database kettle writes to

is that to a new project? What is the: target project and target cluster?

MushuEE avatar Jul 31 '20 01:07 MushuEE

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Oct 29 '20 02:10 fejta-bot

/remove-lifecycle stale

spiffxp avatar Nov 03 '20 19:11 spiffxp

/assign @MushuEE @spiffxp to investigate possible approaches

spiffxp avatar Jan 21 '21 19:01 spiffxp

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Apr 22 '21 19:04 fejta-bot

any updates @MushuEE?

BenTheElder avatar Apr 22 '21 19:04 BenTheElder

/remove-lifecycle stale

BenTheElder avatar Apr 22 '21 19:04 BenTheElder

/milestone clear

ameukam avatar Apr 22 '21 19:04 ameukam

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jul 21 '21 20:07 fejta-bot

/remove-lifecycle stale /milestone v1.23

ameukam avatar Jul 21 '21 21:07 ameukam

/remove-priority important-longterm /priority important-soon

spiffxp avatar Sep 02 '21 19:09 spiffxp

/assign /milestone v1.24

ameukam avatar Dec 06 '21 17:12 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 06 '22 17:03 k8s-triage-robot

/remove-lifecycle stale

ameukam avatar Mar 06 '22 18:03 ameukam

/milestone clear /lifecycle frozen /priority backlog

ameukam avatar May 12 '22 02:05 ameukam

I don't have much access to the k8s-gubernator project^1 currently, so it's a bit difficult to do much to help here. I'm going to ask around about access.

The deployment details are more or less in the repo at least https://github.com/kubernetes/test-infra/blob/master/kettle/.

BenTheElder avatar Apr 01 '24 22:04 BenTheElder

@ixdy still works at Google and still had access, despite long since not working in cloud anymore ... myself and @liggitt now have owner access to k8s-gubernator project for continuity until we can migrate it. Thanks Jeff!

This still needs to happen before the prow default cluster shutdown in August and sooner is better.

BenTheElder avatar Apr 01 '24 22:04 BenTheElder

So we still have one "g8r" cluster on 1.26.11-gke.1055000 with 3 node pools, "pool-1" (e2-highmem-16, 1 node), "pool-highmem" (n1-highmem-8, 2 nodes), "pool-large" (n1-standard-8, 0 nodes).

It is running "kettle" and "kettle-staging" deployments with one pod each.

Each of those has a PD-SSD, 3001 and 201 GB respectively.

There are some bigquery datasets in this project, build/all is 1.67 TB.

BenTheElder avatar Apr 01 '24 22:04 BenTheElder

Given initially ingest this data from the prow GCS logs, I think we should probably look at cold-starting a new instance running in AAA, just overriding the cluster/project and deploying with the existing tooling.

There's a lot to be desired around auto deployment etc however

BenTheElder avatar Apr 01 '24 22:04 BenTheElder

I think @dims has this working, one remaining item will be when we're confident this is done let Googlers know and we'll see about turning down the old instance / GCP project ... (FYI @michelle192837 @cjwagner)

BenTheElder avatar Apr 18 '24 22:04 BenTheElder

@BenTheElder i want to watch it for a week before we can call it done!

dims avatar Apr 19 '24 00:04 dims

Exciting stuff! :D Thanks y'all!

michelle192837 avatar Apr 24 '24 22:04 michelle192837

[I scaled the old cluster down to zero this week, we'll check back next week]

BenTheElder avatar Apr 30 '24 19:04 BenTheElder

thanks @BenTheElder

dims avatar Apr 30 '24 20:04 dims

https://storage.googleapis.com/k8s-triage/index.html is being updated.

and the flakes json looks good as well

❯ gsutil ls -l gs://k8s-metrics


Updates are available for some Google Cloud CLI components.  To install them,
please run:
  $ gcloud components update

       114  2024-05-02T00:05:31Z  gs://k8s-metrics/build-stats-latest.json
     10040  2024-05-02T00:04:51Z  gs://k8s-metrics/failures-latest.json
    103224  2024-05-02T00:04:20Z  gs://k8s-metrics/flakes-daily-latest.json
    204024  2024-05-02T00:05:48Z  gs://k8s-metrics/flakes-latest.json
         5  2024-05-02T00:04:09Z  gs://k8s-metrics/job-flakes-latest.json
    376585  2024-05-02T00:05:08Z  gs://k8s-metrics/job-health-latest.json
         3  2024-05-02T00:05:20Z  gs://k8s-metrics/pr-consistency-latest.json
     83496  2024-05-02T00:04:36Z  gs://k8s-metrics/presubmit-health-latest.json
         3  2024-05-02T00:06:01Z  gs://k8s-metrics/weekly-consistency-latest.json
                                 gs://k8s-metrics/build-stats/
                                 gs://k8s-metrics/failures/
                                 gs://k8s-metrics/flakes-daily/
                                 gs://k8s-metrics/flakes/
                                 gs://k8s-metrics/istio-job-flakes/
                                 gs://k8s-metrics/job-flakes/
                                 gs://k8s-metrics/job-health/
                                 gs://k8s-metrics/pr-consistency/
                                 gs://k8s-metrics/presubmit-health/
                                 gs://k8s-metrics/weekly-consistency/
TOTAL: 9 objects, 777494 bytes (759.27 KiB)

We can turn down the old cluster early next week @BenTheElder

dims avatar May 02 '24 11:05 dims

SGTM. At some point I'd like to turn down the bigquery datasets and anything else lingering in that project as well.

BenTheElder avatar May 02 '24 19:05 BenTheElder