origin icon indicating copy to clipboard operation
origin copied to clipboard

OCPBUGS-62929: Check router RBAC before external cert ops

Open bentito opened this issue 2 months ago • 17 comments

Added a wait step so the router service account’s RBAC settles before we create or update routes that use external certificates. The new helper impersonates the router SA and polls for get/list/watch access on the referenced secret, which eliminates the Forbidden errors that were flaking CI when the admission webhook fired during RBAC propagation

bentito avatar Oct 17 '25 17:10 bentito

@bentito: This pull request references Jira Issue OCPBUGS-62929, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Added a wait step so the router service account’s RBAC settles before we create or update routes that use external certificates. The new helper impersonates the router SA and polls for get/list/watch access on the referenced secret, which eliminates the Forbidden errors that were flaking CI when the admission webhook fired during RBAC propagation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Oct 17 '25 17:10 openshift-ci-robot

/jira refresh

bentito avatar Oct 17 '25 17:10 bentito

@bentito: This pull request references Jira Issue OCPBUGS-62929, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @lihongan

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Oct 17 '25 17:10 openshift-ci-robot

Risk analysis has seen new tests most likely introduced by this PR. Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: e3f4c0f9cebe669bc734b3114cbd3bb41927f5b3

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] server supports sending resources in Table format [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by metadata client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] reflector doesn't support receiving resources as Tables [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by client-go's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by dynamic client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by informers when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by metadatainformer when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.

New tests seen in this PR at sha: e3f4c0f9cebe669bc734b3114cbd3bb41927f5b3

  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] reflector doesn't support receiving resources as Tables [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] server supports sending resources in Table format [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by client-go's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by dynamic client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by metadata client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by informers when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by metadatainformer when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]

openshift-trt[bot] avatar Oct 17 '25 22:10 openshift-trt[bot]

/retest

bentito avatar Oct 18 '25 22:10 bentito

/retest

bentito avatar Oct 19 '25 02:10 bentito

Do we need a similar wait for the tests that delete RBAC or secrets? I don't know whether those tests have been flaky, but it seems to me that we might have race conditions in those tests too.

I don't think so, here's why:

  • Secret deletions already flow through checkRouteStatus, which polls until the router reports ExternalCertificateValidationFailed, so we’re effectively waiting for the controller to observe the change (test/extended/router/external_certificate.go:239).
  • RBAC deletions exercised in the “routes are not reachable” path also use that same status poll, so propagation is covered there (test/extended/router/external_certificate.go:293).
  • The update scenarios that expect an API call to be rejected rely on the apiserver RBAC authorizer evaluating permissions synchronously at request time. Once the role binding is deleted, the admission stack should block the request immediately.

So there’s no extra wait needed, I think, atm. NB: I also didn't hunt for related flakes though

bentito avatar Oct 21 '25 18:10 bentito

/retest

bentito avatar Oct 22 '25 15:10 bentito

/retest

bentito avatar Oct 22 '25 16:10 bentito

@Miciah : The cycle before, there were 5 failing e2e but none for this flake in question, and currently we have 1 failing e2e and not b/c of this flake. Can you take another review pass?

bentito avatar Oct 23 '25 18:10 bentito

/assign @Miciah

bentito avatar Oct 23 '25 18:10 bentito

/assign @rfredette

candita avatar Oct 30 '25 20:10 candita

/retest

bentito avatar Nov 18 '25 15:11 bentito

@bentito: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-ipv6 bf0e17cfb1503ef4adc30cb8adb420cae6a51fc0 link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/okd-scos-e2e-aws-ovn bf0e17cfb1503ef4adc30cb8adb420cae6a51fc0 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Nov 18 '25 18:11 openshift-ci[bot]

Job Failure Risk Analysis for sha: bf0e17cfb1503ef4adc30cb8adb420cae6a51fc0

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (22) are below the historical average (2160): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt[bot] avatar Nov 18 '25 19:11 openshift-trt[bot]

Thanks!

/lgtm

Miciah avatar Nov 19 '25 00:11 Miciah

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bentito, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Nov 19 '25 00:11 openshift-ci[bot]

/jira refresh

The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity.

openshift-bot avatar Dec 12 '25 08:12 openshift-bot

@openshift-bot: This pull request references Jira Issue OCPBUGS-62929, which is invalid:

  • expected the bug to target either version "4.22." or "openshift-4.22.", but it targets "4.21.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Dec 12 '25 08:12 openshift-ci-robot