test-infra icon indicating copy to clipboard operation
test-infra copied to clipboard

pull-kubernetes-e2e-gce: enable KUBE_WATCHLIST_INCONSISTENCY_DETECTOR

Open p0lyn0mial opened this issue 1 year ago • 9 comments

When this environmental variable is set then the reflector performs a data consistency check.

The consistency check is meant to be enforced only in the CI, not in production. The check ensures that data retrieved by the watch-list api call is exactly the same as data received by the standard list api call.

As of today the inconsistency detector runs with pull-kubernetes-e2e-gce-cos-alpha-features where it is used only by a single test.

As we are preparing to promote the WatchList feature to beta and enable it for KCM, this job will be used to check for any inconsistencies.

p0lyn0mial avatar Jan 09 '24 12:01 p0lyn0mial

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: p0lyn0mial Once this PR has been reviewed and has the lgtm label, please assign cheftako for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jan 09 '24 12:01 k8s-ci-robot

/assign @deads2k @wojtek-t

p0lyn0mial avatar Jan 09 '24 12:01 p0lyn0mial

/test pull-test-infra-verify-lint

p0lyn0mial avatar Jan 09 '24 13:01 p0lyn0mial

xref: https://github.com/kubernetes/kubernetes/pull/122659

p0lyn0mial avatar Jan 09 '24 13:01 p0lyn0mial

@p0lyn0mial: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-test-infra-verify-lint fb95c271f31addab016b3e6fd389f99b4f62a89b link true /test pull-test-infra-verify-lint

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Jan 09 '24 13:01 k8s-ci-robot

do you have PR open demonstrating how this will be used? I think that will help reviewers understand the purpose better and how the failures will be manifested to developers if/when it fails.

Playing this forward, I think when set and the client-go featuregate is enabled this would cause the .List to run both a streaming list and a regular list and compare the results?

deads2k avatar Jan 09 '24 14:01 deads2k

do you have PR open demonstrating how this will be used? I think that will help reviewers understand the purpose better and how the failures will be manifested to developers if/when it fails.

As of today when this env var is enabled and the WatchList feature is enabled for the server and the ENABLE_CLIENT_GO_WATCH_LIST_ALPHA is set for a client then a reflector/informer (on client) will run both a streaming list and a regular list and compare the results. If the results are different then the reflector/informer will panic and crash the entire binary (and the entire ci job). This is similar to ENABLE_CACHE_MUTATION_DETECTOR.

Note today the entire path is already executed in https://github.com/kubernetes/kubernetes/blob/f2cfbf44b1fb482671aedbfff820ae2af256a389/test/e2e/apimachinery/watchlist.go#L39 but only for e2e-gce-cos-alpha-features job.

Note2 merging this pr won't have any effect until the WatchList feature is enabled by default (both for the server and the client). My intention is to have this PR merged so that we can start collecting data when the feature is enabled.

Note3 we should add a similar detection mechanism for the .List method once we have watchlist support for the .List method.

p0lyn0mial avatar Jan 09 '24 16:01 p0lyn0mial

Can you open a k/k PR with the change you plan to merge to use this env var?

deads2k avatar Jan 09 '24 17:01 deads2k

Can you open a k/k PR with the change you plan to merge to use this env var?

It will be more than one PR, the first one is at https://github.com/kubernetes/kubernetes/pull/122659 Do you want to hold back this PR until we know that we want to promote the feature to beta?

p0lyn0mial avatar Jan 09 '24 20:01 p0lyn0mial