test-infra
test-infra copied to clipboard
migrate away from `test-infra-trusted` build cluster
There are a few jobs running on the test-infra-trusted we should either migrate to k8s-infra-prow-build-trusted or remove:
- [ ] post-test-infra-push-git
- [ ] post-test-infra-push-git-custom-k8s-auth
- [ ] post-test-infra-upload-testgrid-config
- [ ] ci-test-infra-update-slack-oncall
/assign @michelle192837 /sig testing
ci-test-infra-update-slack-oncall
no point migrating this, we'll just shut it down when prow is migrated and instead people can posted in #testing-ops in slack.
we should actually probably proactively stop advertising @test-infra-oncall to the broader project.
post-test-infra-upload-testgrid-config
.... uhhhh this one I'm not sure, because we have to be able to publish to testgrid's config bucket .... migrating testgrid is another fun topic
The image publishing jobs we should be able to move over.
re: ci-test-infra-update-slack-oncall: Ah, that's easier then.
re: post-test-infra-upload-testgrid-config: I think this should be doable. I have not gone through the full details, but imo thanks to config merger merging configs for TestGrid from multiple locations, we can stand up a new config upload job in community-owned infra, verify the uploaded config in the new location is the same as the old, and swap the config location used in the TestGrid instance overall.
On the K8s infra side we're going to need a bucket for this to start then, cc @upodroid @ameukam for thoughts.
post-test-infra-push-git post-test-infra-push-git-custom-k8s-auth
Not sure how these didn't wind up getting migrated yet ... looks like this is part of k8s-testimages https://github.com/kubernetes/k8s.io/issues/1523
I don't see evidence that we're actually using these images in Kubernetes and we should probably just delete them.
Prow has built in known-hosts handlinmg in clonerefs these days, I don't think we need these anymore.
Sorry for the delay, I'm looking into this and some of the other unmigrated jobs today.
in #32808 the list should be clearer now, a lot of these are related to running prow so that's fine, but some are pushing images and that's concerning, we should either eliminate or migrate them.
here's one https://github.com/kubernetes/test-infra/pull/32812
| File Path | Job | Link |
|---|---|---|
| config/jobs/kubernetes/test-infra/test-infra-periodics.yaml | job-migration-todo-report | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | ci-test-infra-autobump-prow-for-auto-deploy | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | ci-test-infra-autobump-prow | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | ci-test-infra-update-slack-oncall | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | ci-test-infra-branchprotector | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | ci-test-infra-label-sync | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | ci-test-infra-gencred-refresh-kubeconfig | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | ci-test-infra-rotate-legacy-default-build-sa-json-key | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-alpine | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-gcloud-terraform | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-git | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-git-custom-k8s-auth | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-deploy-prow | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-reconcile-hmacs | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-misc-images | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-kettle | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-bazel | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-gcb-docker-gcloud | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-test-gubernator | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-gencred | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-gencred-refresh-kubeconfig | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-upload-oncall | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-upload-testgrid-config | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-upload-boskos-config | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-cip-prow | Search Results |
SIG Contribex:
| File Path | Job | Link |
|---|---|---|
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-community-tempelis-apply | Search Results |
Not trusted cluster, but the other non-migrated jobs with test-infra in the name (there could be more) ...
| File Path | Job | Link |
|---|---|---|
| config/jobs/kubernetes/test-infra/janitors.yaml | maintenance-pull-janitor | Search Results |
| config/jobs/kubernetes/test-infra/janitors.yaml | maintenance-ci-aws-janitor | Search Results |
| config/jobs/kubernetes/test-infra/janitors.yaml | maintenance-ci-janitor | Search Results |
Janitor jobs: won't be migrated, will be turned down.
post-test-infra-upload-oncall, ci-test-infra-update-slack-oncall: no need, this will be obsolete.
job-migration-todo-report: will be obsolete, also this isn't working correctly and we're just manually checking in the tool output, I'll clean this one up.
ci-test-infra-rotate-legacy-default-build-sa-json-key: will be obsolete
post-test-infra-upload-boskos-config: will be obsolete, we have a different boskos config in github.com/kubernetes/k8s.io for community boskos resources
post-test-infra-cip-prow: I deleted this in #32812
post-test-infra-push.* are concerning.
post-test-infra-upload-testgrid-config will need migrating
I'm guessing renconcile hmacs needs to be considered as part of control plane migration, along with definitely branchprotector.
https://github.com/kubernetes/test-infra/pull/32814 will remove the job-migration-todo-report report job.
ci-test-infra-label-sync should be able to migrate to k8s-infra-prow-build-trusted without waiting for the rest of prow, but we might not have the right secrets available yet.
On the K8s infra side we're going to need a bucket for this to start then, cc @upodroid @ameukam for thoughts.
post-test-infra-push-git post-test-infra-push-git-custom-k8s-auth
Not sure how these didn't wind up getting migrated yet ... looks like this is part of k8s-testimages kubernetes/k8s.io#1523
I don't see evidence that we're actually using these images in Kubernetes and we should probably just delete them.
Prow has built in known-hosts handlinmg in clonerefs these days, I don't think we need these anymore.
These are used as the base images for building Prow images (https://cs.k8s.io/?q=gcr.io%2Fk8s-prow%2Fgit&i=nope&files=&excludeFiles=&repos=). I think we can replace the git image with alpine, but git-custom-k8s-auth might need to stay?
| Job | Link | Uses |
|---|---|---|
| post-test-infra-push-alpine | Search Results | Search Results |
| post-test-infra-push-gcloud-terraform | Search Results | Search Results |
| post-test-infra-push-git | Search Results | Search Results |
| post-test-infra-push-git-custom-k8s-auth | Search Results | Search Results |
| post-test-infra-push-misc-images | Search Results | Search Results |
| post-test-infra-push-kettle | Search Results | Search Results |
| post-test-infra-push-bazel | Search Results | Search Results |
| post-test-infra-push-gcb-docker-gcloud | Search Results | Search Results |
| post-test-infra-push-test-gubernator | Search Results | Search Results |
| post-test-infra-push-gencred | Search Results | Search Results |
Several of these push images that aren't used and should be turned down (post-test-infra-push-test-gubernator, post-test-infra-push-bazel, post-test-infra-push-gcloud-terraform, post-test-infra-push-gencred).
- Note that
post-test-infra-push-gencredhasn't succeeded, and pushed to k8s-testimages, which is not what jobs are using; jobs use the image pushed to k8s-prow and pushed by post-test-infra-push-misc-images)
Discussed offline: for post-test-infra-push-git and post-test-infra-push-git-custom-k8s-auth, since we'll need to migrate the latter anyways, we can migrate the former at the same time, then see if we can replace the git image base with alpine instead.
then see if we can replace the git image base with alpine instead.
we should probably use something else, we generally prefer to use e.g. debian/distroless for kubernetes base images, for licensing reasons (alpine/busybox) and alignment on patching etc.
I'm working on tempelis https://kubernetes.slack.com/archives/C4M06S5HS/p1719431441099159 https://github.com/kubernetes/test-infra/pull/32928
Sorry for the late response. I can confirm that git-custom-k8s-auth is used by prow to authenticate to non-GKE clusters (currently it's only EKS)
https://github.com/kubernetes-sigs/prow/blob/main/.ko.yaml
+1 for building a unified base image for prow that has git, the kubectl auth plugins for our cloud vendors
We can migrate that job to the community cluster and update the .ko.yaml references
We can do something similar to the distroless-iptables image in k/release.
tempelis will be done after #32946
https://github.com/kubernetes/test-infra/pull/32948 will do label sync
| File Path | Job | Link | Uses |
|---|---|---|---|
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-alpine | Search Results | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-gcb-docker-gcloud | Search Results | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-git | Search Results | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-git-custom-k8s-auth | Search Results | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-kettle | Search Results | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-misc-images | Search Results | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-upload-testgrid-config | Search Results |
With the linked PRs, we should have a canary job for all these jobs. Once these are submitted and we have new images for all of them, I'll switch the relevant uses to use the k8s-staging-test-infra images instead, and turn down the old image pushing jobs.
(The TestGrid config switch is a bit more involved but not much more. I just need to swap what config is referenced in the mergelists after verifying the new is the same as the old, and that config merger has permissions to read from the new bucket. I'll look into that now.)
After today's SIG meeting I eliminated the oncall update jobs (slack, GCS) #33083 #33084
We should probably pre-emptively migrate ci-test-infra-branchprotector to the new trusted cluster.
migrating branch protector looks straightforward, will send a PR in a little bit.
https://github.com/kubernetes/test-infra/pull/33098 takes care of the branch protector.
That leaves:
- assorted image pushing (most of which are in https://github.com/kubernetes/test-infra/issues/32432#issuecomment-2240023962, but not e.g. the prow image push)
- testgrid config upload
- assorted boskos / janitor jobs we don't need to migrate (but we should make sure when we move prow that we do cleanup the boskos pools one more time afterwards, that's a fun new migration wrinkle I just thought of @michelle192837 @dims @upodroid @ameukam ... Googlers will probably have to run that for us but we can prepare the commands/scripts ...)
- prow auto-deploy (we'll replace this? discussed in the meeting yesterday)
- prow kubeconfig / credential rotation jobs (shouldn't need to migrate these?)
So when we move prow we'll also have a small list of jobs to disable and we should probably prepare that.
These are the main remaining jobs aside from the following out of scope here:
- vsphere using jobs
- azure using jobs (in progress...)
- scale test presubmit and related janitor jobs (TODO, low prio)
- something else we may have missed writing it off as one of the above (e.g. recently discovered https://github.com/kubernetes/test-infra/pull/33091)
So we should definitely focus on these while Azure folks work on migrating those.
I've also noticed that we'll have to be careful updating the prow deployment specs for the new cluster, because e.g. we gave the secrets clearer names and a different path for the github token.
IMHO, we can remove post-test-infra-upload-boskos-config. we no longer need to increase the boskos pool and potentially need to shutdown the GCP projects part of it.
Fixing TestGrid upload job today and cleaning up some of the image jobs/references.
IMHO, we can remove post-test-infra-upload-boskos-config. we no longer need to increase the boskos pool and potentially need to shutdown the GCP projects part of it.
agreed, filed https://github.com/kubernetes/test-infra/pull/33121
TestGrid upload progress:
- Job works now! https://testgrid.k8s.io/sig-testing-maintenance#testgrid-config-update-canary
- Verified the contents are the same:
# See https://github.com/GoogleCloudPlatform/testgrid/tree/main/config/print#config-printer for the print utility.
~/go/bin/print gs://k8s-testgrid/configs/k8s/config > k8s-testgrid-config.textproto
~/go/bin/print gs://k8s-testgrid-config/k8s/config > k8s-infra-testgrid-config.textproto
diff k8s-testgrid-config.textproto k8s-infra-testgrid-config.textproto
# This produces no diffs
(And these do have contents):
wc -l k8s-testgrid-config.textproto
519759 k8s-testgrid-config.textproto
wc -l k8s-infra-testgrid-config.textproto
519759 k8s-infra-testgrid-config.textproto
Now following the config merger instructions at https://github.com/kubernetes/test-infra/blob/master/testgrid/merging.md#config-merger. I'll have a few PRs out for those.
Remaining from my list above:
| File Path | Job | Link | Uses |
|---|---|---|---|
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-alpine | Search Results | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-git | Search Results | Search Results |
| config/jobs/kubernetes/test-infra/test-infra-trusted.yaml | post-test-infra-push-misc-images | Search Results | Search Results |
post-test-infra-push-alpine just needs minor cleanup, then it can be deleted. post-test-infra-push-git can probably be deleted; the remaining use of it is as the base for certain Prow images. I can't switch them over immediately (integration tests fail when switching from the January image to a recent July image), but I believe switching to an image from the old location will have the same problem. post-test-infra-push-misc-images needs a fix (I think the most recent PR will fix it, but it needs a retrigger to verify that's the case), then the images need to be switched to the new location before the old job is turned down.
(And last bit of cleanup, move all the new image push jobs to the image-pushes dashboard and remove '-canary' from the job name).