kubernetes-ingress-controller icon indicating copy to clipboard operation
kubernetes-ingress-controller copied to clipboard

Configure test workflows to run the right tests

Open mflendrich opened this issue 3 years ago • 5 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Problem Statement

Following Travis's report in #2751, I've done some additional research - here's what appears to be our current CI configuration for tests:

Workflow Unit Tests /test/integration, stable Gateway, dbless & postgres /test/integration, nightly Gateway, dbless /test/integration, stable Gateway, dbless, feature gates on /test/e2e /test/conformance
PR branch pre-merge :heavy_check_mark: :heavy_check_mark: (kind) :heavy_minus_sign: disabled, dispatch only (kind) :heavy_check_mark: (kind) :heavy_multiplication_x: :heavy_check_mark: (kind)
main every night :heavy_multiplication_x: :heavy_check_mark: (gcp 1.20 1.21 1.22 1.23) :heavy_multiplication_x: :heavy_multiplication_x: :heavy_check_mark: (kind) :heavy_multiplication_x:
main as release blocker :heavy_multiplication_x: :heavy_check_mark: (gcp 1.20 1.21 1.22 1.23 1.24) :heavy_multiplication_x: :heavy_multiplication_x: :heavy_multiplication_x: :heavy_multiplication_x:

with the following interpretation assumed:

  • /test/integration means invocation either by make test.integration[.*] or by /hack/e2e/run-tests.sh
  • /test/e2e means invocation by make test.e2e

This doesn't feel effective:

  • PR pre-merge workflow is suitable for quick-turnaround tests that have low risk of being affected by broken bleeding-edge dependencies
  • nightly workflow is suitable for heavy tests and integrations between the bleeding-edge component under test (KIC) and other other bleeding-edge versions of other components (here: Gateway)
  • release workflow seems like the place where all our tests (fast and slow) should be run against the latest(:eight_spoked_asterisk:) stable (:eight_spoked_asterisk: :eight_spoked_asterisk:) version of components (k8s, gateway)
    • :eight_spoked_asterisk: - running against versions older than latest provides little feedback that isn't provided by nightly runs, and drastically increases the risk of test flakes delaying a release
    • :eight_spoked_asterisk: :eight_spoked_asterisk: - there's no apparent added value of running release tests against unreleased versions, because the compatibility matrix isn't going to recommend them anyway, and incompatibilities with future versions will usually be known ahead from nightly tests.
  • note the (probably inexplicable) disconnect between nightly and release workflows in k8s versions being tested on.

Proposed Solution

Workflow Unit Tests /test/integration, stable Gateway, dbless & postgres /test/integration, nightly Gateway, dbless /test/integration, stable Gateway, dbless, feature gates on /test/e2e /test/conformance
PR branch pre-merge :heavy_check_mark: :heavy_check_mark: (kind) :question: TBD :heavy_check_mark: (kind) :heavy_multiplication_x: :heavy_check_mark: (kind)
main every night :heavy_check_mark: :heavy_check_mark: (gcp current + previous k8s) :question: TBD :heavy_check_mark: (gcp current k8s) :heavy_check_mark: (kind) :heavy_check_mark: (kind)
main as release blocker :heavy_check_mark: :heavy_check_mark: (gcp current k8s) :heavy_multiplication_x: :heavy_check_mark: (kind) :heavy_check_mark: (kind) :heavy_check_mark: (kind)

The reason for no GCP usage in release tests is the assumption that kind is faster than GCP.

Additional information

Consider doing #2788 before this.

This issue extends #2751 and proposes a view encompassing all our testing workflows at once.

Acceptance Criteria

  • [ ] PR gate pipeline updated as described in Proposed Solution in a single well-named workflow file, e.g. pr_test.yaml
  • [ ] main nightly pipeline updated as described in Proposed Solution in a single well-named workflow file, e.g. nightly.yaml (today there's a nightly.yaml and an e2e.yaml causing confusion)
  • [ ] release pipeline updated as described in Proposed Solution.

mflendrich avatar Aug 08 '22 15:08 mflendrich

Thanks for preparing this summary 🦸 !

Just a minor comment from my PoV:

  • we'd need to define the "previous k8s" term. I guess this doesn't change after this issue is resolved (all points addressed) but I believe that in order to align expectations we should put this somewhere in writing that we aim to support current + N last versions of k8s.

pmalek avatar Aug 08 '22 16:08 pmalek

Good point. My hunch is that this should be one of

  • versions of k8s available on popular k8s cloud providers
  • versions of k8s officially supported as per our Compatibility Matrix

mflendrich avatar Aug 08 '22 16:08 mflendrich

  • versions of k8s available on popular k8s cloud providers

Had that problem in the past. This is a very noble thing to aim for but then there has to be a mechanism in place to track those. Different cloud providers have different schedules for sunsetting old released and introducing support for new ones. They also have a different N of "most recent supported versions".

So yes, I agree it would be really nice to have that but on a "best effort" basis, i.e. we'll aim to support whatever our cloud providers of choice support but we can't guarantee the schedule.

pmalek avatar Aug 08 '22 16:08 pmalek

  • The actual release job doesn't need to run integration tests. They'll have run already since we merge a PR with the updated changelog and deploy manifests immediately before.
  • Nightly gateway is dispatch-only (you label PRs that introduce unreleased features and it runs on them along with the other integration tests) as of last week. I changed it after an informal chat poll where nobody expressed a preference for nightly cron runs.
  • Keeping the original intent of the E2E tests would have them run on GKE. I don't think this would require much beyond updating the Makefile to actually enable them, since both pre-release and nightly cron jobs use images available on Dockerhub (the dispatch E2E would need new KTF functionality to push temporary images to GCR). I don't think there's anything we're obviously missing out on by only running the integration tests on GKE, however.

rainest avatar Aug 08 '22 16:08 rainest

The actual release job doesn't need to run integration tests. They'll have run already since we merge a PR with the updated changelog and deploy manifests immediately before.

This would require an explicit "check that this test has passed for the latest run" (and it was not force-merged or anything). I'm not opposed to that.

Nightly gateway is dispatch-only

Okay. Updating the table to reflect that.

Keeping the original intent of the E2E tests would have them run on GKE.

For E2E tests, both kind and GKE have their benefits, I personally don't have a strong preference here.

mflendrich avatar Aug 08 '22 17:08 mflendrich

Stale

programmer04 avatar Sep 25 '23 15:09 programmer04