origin icon indicating copy to clipboard operation
origin copied to clipboard

CNTRLPLANE-941: (monitor): ensure KAS doesn't excessively log unhandled informer errors

Open everettraven opened this issue 3 months ago • 17 comments

as this is a signal that there is something that may not be working correctly within the kube-apiserver.

We spotted this as an issue when tearing down the OpenShift OAuth stack during the rollout of an External OIDC enabled cluster. See https://issues.redhat.com/browse/OCPBUGS-45460 for more details.

This led to the discovery that creation of RBAC resources could be blocked because a KAS admission plugin relied on using an informer that would no longer work because the API it relies on is tied to the OpenShift OAuth API server, which we now disable.

This monitor test is meant to serve 2 purposes:

  • Identify future occurrences of these unhandled errors as they seem harmless at first glance but can have a much deeper impact.
  • Show that https://github.com/openshift/kubernetes/pull/2157 resolves the excessive logging of these unhandled errors.

everettraven avatar Sep 29 '25 19:09 everettraven

@everettraven: This pull request references CNTRLPLANE-941 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

as this is a signal that there is something that may not be working correctly within the kube-apiserver.

We spotted this as an issue when tearing down the OpenShift OAuth stack during the rollout of an External OIDC enabled cluster. See https://issues.redhat.com/browse/OCPBUGS-45460 for more details.

This led to the discovery that creation of RBAC resources could be blocked because a KAS admission plugin relied on using an informer that would no longer work because the API it relies on is tied to the OpenShift OAuth API server, which we now disable.

This monitor test is meant to serve 2 purposes:

  • Identify future occurrences of these unhandled errors as they seem harmless at first glance but can have a much deeper impact.
  • Show that https://github.com/openshift/kubernetes/pull/2157 resolves the excessive logging of these unhandled errors.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Sep 29 '25 19:09 openshift-ci-robot

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

openshift-ci[bot] avatar Sep 29 '25 19:09 openshift-ci[bot]

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: everettraven Once this PR has been reviewed and has the lgtm label, please assign neisw for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Sep 29 '25 19:09 openshift-ci[bot]

/payload-job periodic-ci-openshift-cluster-authentication-operator-release-4.21-periodics-e2e-gcp-external-oidc-configure-techpreview

everettraven avatar Sep 29 '25 19:09 everettraven

@everettraven: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-cluster-authentication-operator-release-4.21-periodics-e2e-gcp-external-oidc-configure-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/444a31e0-9d6b-11f0-8baa-cd0f70e1eac4-0

openshift-ci[bot] avatar Sep 29 '25 19:09 openshift-ci[bot]

/payload-job periodic-ci-openshift-cluster-authentication-operator-release-4.21-periodics-e2e-gcp-external-oidc-configure-techpreview

everettraven avatar Sep 30 '25 15:09 everettraven

@everettraven: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-cluster-authentication-operator-release-4.21-periodics-e2e-gcp-external-oidc-configure-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/57465640-9e13-11f0-8d1f-d595b4f3ff1c-0

openshift-ci[bot] avatar Sep 30 '25 15:09 openshift-ci[bot]

/payload-job periodic-ci-openshift-cluster-authentication-operator-release-4.21-periodics-e2e-gcp-external-oidc-configure-techpreview

everettraven avatar Oct 01 '25 17:10 everettraven

@everettraven: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-cluster-authentication-operator-release-4.21-periodics-e2e-gcp-external-oidc-configure-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ad2116b0-9eed-11f0-86f9-2a5ad673167f-0

openshift-ci[bot] avatar Oct 01 '25 17:10 openshift-ci[bot]

/payload-job periodic-ci-openshift-cluster-authentication-operator-release-4.21-periodics-e2e-gcp-external-oidc-configure-techpreview

everettraven avatar Oct 02 '25 17:10 everettraven

@everettraven: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

openshift-ci[bot] avatar Oct 02 '25 17:10 openshift-ci[bot]

/payload-job periodic-ci-openshift-cluster-authentication-operator-release-4.21-periodics-e2e-gcp-external-oidc-configure

everettraven avatar Oct 02 '25 17:10 everettraven

@everettraven: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-cluster-authentication-operator-release-4.21-periodics-e2e-gcp-external-oidc-configure

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c12ec490-9fb5-11f0-8fdc-28fa7b05405e-0

openshift-ci[bot] avatar Oct 02 '25 17:10 openshift-ci[bot]

/payload-job periodic-ci-openshift-cluster-authentication-operator-release-4.21-periodics-e2e-gcp-external-oidc-configure

everettraven avatar Oct 02 '25 20:10 everettraven

@everettraven: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-cluster-authentication-operator-release-4.21-periodics-e2e-gcp-external-oidc-configure

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2ead5970-9fcd-11f0-92d6-247dbefae6ac-0

openshift-ci[bot] avatar Oct 02 '25 20:10 openshift-ci[bot]

Risk analysis has seen new tests most likely introduced by this PR. Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 472116d4573ce977542dfd978c97d7a0c8fdbf48

  • "[Monitor:kas-log-analyzer][Jira:"kube-apiserver"] monitor test kas-log-analyzer cleanup" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:kas-log-analyzer][Jira:"kube-apiserver"] monitor test kas-log-analyzer collection" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:kas-log-analyzer][Jira:"kube-apiserver"] monitor test kas-log-analyzer interval construction" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:kas-log-analyzer][Jira:"kube-apiserver"] monitor test kas-log-analyzer preparation" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:kas-log-analyzer][Jira:"kube-apiserver"] monitor test kas-log-analyzer setup" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:kas-log-analyzer][Jira:"kube-apiserver"] monitor test kas-log-analyzer test evaluation" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:kas-log-analyzer][Jira:"kube-apiserver"] monitor test kas-log-analyzer writing to storage" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:kas-log-analyzer][Jira:"kube-apiserver"] should not excessively log informer reflector unhandled errors" [Total: 2, Pass: 2, Fail: 0, Flake: 0]

openshift-trt[bot] avatar Nov 04 '25 20:11 openshift-trt[bot]

@everettraven: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-csi 472116d4573ce977542dfd978c97d7a0c8fdbf48 link true /test e2e-aws-csi
ci/prow/e2e-gcp-csi 472116d4573ce977542dfd978c97d7a0c8fdbf48 link true /test e2e-gcp-csi
ci/prow/go-verify-deps 472116d4573ce977542dfd978c97d7a0c8fdbf48 link true /test go-verify-deps
ci/prow/e2e-metal-ipi-ovn-ipv6 472116d4573ce977542dfd978c97d7a0c8fdbf48 link true /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Nov 18 '25 13:11 openshift-ci[bot]