pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

Update kubernetes version of the CI

Open vdemeester opened this issue 7 months ago • 11 comments

Changes

We are lagging a bit behind in terms of kubernetes version we test against. This updates to run on more "recent" versions of k8s.

@tektoncd/core-maintainers I am not sure what version we should use there, but… right now we test 1.28.x and 1.29.x, but most recent version of kubernetes is 1.33.x. We should definitely keep the version we specify in MIN_KUBERNETES_VERSION in our tests, but I wonder if we should run all version from it to the latest (which would be 6 k8s version, making 18 e2e runs).

/kind misc

Signed-off-by: Vincent Demeester [email protected]

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • [ ] Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • [ ] Has Tests included if any functionality added or changed
  • [ ] pre-commit Passed
  • [x] Follows the commit message standard
  • [x] Meets the Tekton contributor standards (including functionality, content, code)
  • [x] Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • [ ] Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • [ ] Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

NONE

vdemeester avatar May 12 '25 09:05 vdemeester

/hold

vdemeester avatar May 12 '25 09:05 vdemeester

In terms of how many k8s versions we need to test, I guess it depends on what kind of impact it has in terms of running CI (do we get that many VMs available right away?) and stability - if there are flacky tests getting all tests to pass would become difficult.... if neither is an issue we could test all supported versions.

afrittoli avatar May 12 '25 10:05 afrittoli

The oldest test is broken:

ERROR: timeout waiting for pods to come up
tekton-events-controller-8655577f85-g8kzd      0/1   CrashLoopBackOff   5 (2m8s ago)   5m8s
tekton-pipelines-controller-745d695898-9gt2z   0/1   CrashLoopBackOff   5 (107s ago)   5m8s
tekton-pipelines-webhook-97956b68b-nwlk4       0/1   CrashLoopBackOff   5 (2m9s ago)   5m8s
ERROR: Tekton Pipeline did not come up

afrittoli avatar May 12 '25 10:05 afrittoli

Why are we not testing against the new 1.33? Do we have a latest minus 1 policy? for release and testing ?

waveywaves avatar May 13 '25 06:05 waveywaves

I restarted them a couple of times (~7 or 8?). They all start quickly.. it shows a bit more flakiness (because there is 15 runs now)

vdemeester avatar May 13 '25 12:05 vdemeester

Why are we not testing against the new 1.33? Do we have a latest minus 1 policy? for release and testing ?

The image weren’t working (I need to check the version of kind that gets installed). But ideally we would 😇

vdemeester avatar May 13 '25 14:05 vdemeester

So quick recap @afrittoli @waveywaves

  • We could enable 1.33.x as well
  • It doesn't slow down the CI as they all start almost at the same time
  • It's a bit more flaky but not that much (it's "normal" that running 3/4 times more will run into more flakes.. which could be good)

I don't think we want to run all version with stable/beta/alpha though. What we could do is

  • oldest + stable,beta,alpha
  • all between oldest and n-1 with just stable
  • n-1 + stable, beta, alpha
  • latest + stable,beta,alpha

It might be a bit tricky for the required checks as these will be dynamic. We should probably write a small tool that updates the plumbing prow required configuration based on the workflow configuration (it is definitely doable, should be relatively easy).

The other thing I would like to look into is to be able to start some on demand, like I want 1.29 with alpha, it's not in the default run but I want to be able to do /run … and it runs. I need to look into this for GH workflows.

vdemeester avatar May 14 '25 08:05 vdemeester

/retest

waveywaves avatar May 20 '25 13:05 waveywaves

The k8s-plus-one checks cannot be re-run, they do not exists. This is related to the "require checks" question I had

It might be a bit tricky for the required checks as these will be dynamic. We should probably write a small tool that updates the plumbing prow required configuration based on the workflow configuration (it is definitely doable, should be relatively easy).

vdemeester avatar May 21 '25 15:05 vdemeester

  • We could enable 1.33.x as well

Then 1.28.x could be dropped.

I don't think we want to run all version with stable/beta/alpha though. What we could do is...

Agree, looks good.

The other thing I would like to look into is to be able to start some on demand, like I want 1.29 with alpha, it's not in the default run but I want to be able to do /run … and it runs. I need to look into this for GH workflows.

Is there demand for it?

If it is not affecting CI stability anyway then maybe it's easier to just add them to the matrix.

twoGiants avatar May 23 '25 11:05 twoGiants

/retest

vdemeester avatar Jul 04 '25 09:07 vdemeester

The three required pending changes need the plumbing PR to get merged so that they are not required anymore and thus removed.

vdemeester avatar Jul 04 '25 13:07 vdemeester

/hold cancel

vdemeester avatar Jul 07 '25 08:07 vdemeester

@vdemeester I think you need to re-push this to drop the old status checks

afrittoli avatar Jul 16 '25 10:07 afrittoli

Yes 😅 but the CI needs to be fixed first : https://github.com/tektoncd/pipeline/pull/8885 (I think I am almost done if not done 🤞🏼 )

vdemeester avatar Jul 16 '25 11:07 vdemeester

rebased 👼🏼

vdemeester avatar Jul 17 '25 14:07 vdemeester

Uhm, the plumbing PR is merged and the resources deployed to the cluster, but the repo configuration has not been updated yet. I wonder if the prow configuration is for tide only or if it is supposed to change the branch protection rules in the repo config in GitHub as well.

afrittoli avatar Jul 18 '25 13:07 afrittoli

Uhm, the plumbing PR is merged and the resources deployed to the cluster, but the repo configuration has not been updated yet. I wonder if the prow configuration is for tide only or if it is supposed to change the branch protection rules in the repo config in GitHub as well.

It shouldn't, I think all the current required checks were added by prow itself.. maybe there is something weird going on ..

vdemeester avatar Jul 18 '25 13:07 vdemeester

@afrittoli: cat image

In response to this:

Thank you! /meow /approve /lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar Jul 18 '25 13:07 tekton-robot

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

tekton-robot avatar Jul 18 '25 13:07 tekton-robot