origin
origin copied to clipboard
pkg/synthetictests/operators: Fatal unless Available=False in allow-list
Filling in known bugs from here (~WIP: not actually full yet~). This allows us to fail new regressions, and gradually ratchet tighter as we close out existing issues.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: wking
To complete the pull request process, please assign bparees after the PR has been reviewed.
You can assign the PR to them by writing /assign @bparees in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
WIP: not actually full yet
I've filled in the list of regexps in 78c5e91d56, so dropping the WIP.
@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| ci/prow/e2e-metal-ipi-ovn-ipv6 | 054f30d2224d39fa81946f8aa9602283c588c61e | link | false | /test e2e-metal-ipi-ovn-ipv6 |
| ci/prow/e2e-gcp-ovn-rt-upgrade | 054f30d2224d39fa81946f8aa9602283c588c61e | link | false | /test e2e-gcp-ovn-rt-upgrade |
| ci/prow/e2e-agnostic-cmd | 054f30d2224d39fa81946f8aa9602283c588c61e | link | false | /test e2e-agnostic-cmd |
| ci/prow/e2e-aws-single-node-upgrade | 054f30d2224d39fa81946f8aa9602283c588c61e | link | false | /test e2e-aws-single-node-upgrade |
| ci/prow/e2e-aws-ovn-fips | 054f30d2224d39fa81946f8aa9602283c588c61e | link | true | /test e2e-aws-ovn-fips |
| ci/prow/e2e-aws-ovn-serial | 054f30d2224d39fa81946f8aa9602283c588c61e | link | true | /test e2e-aws-ovn-serial |
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| ci/prow/e2e-metal-ipi-ovn-ipv6 | 054f30d2224d39fa81946f8aa9602283c588c61e | link | false | /test e2e-metal-ipi-ovn-ipv6 |
| ci/prow/e2e-gcp-ovn-rt-upgrade | 054f30d2224d39fa81946f8aa9602283c588c61e | link | false | /test e2e-gcp-ovn-rt-upgrade |
| ci/prow/e2e-agnostic-cmd | 054f30d2224d39fa81946f8aa9602283c588c61e | link | false | /test e2e-agnostic-cmd |
| ci/prow/e2e-aws-single-node-upgrade | 054f30d2224d39fa81946f8aa9602283c588c61e | link | false | /test e2e-aws-single-node-upgrade |
| ci/prow/e2e-aws-ovn-fips | 054f30d2224d39fa81946f8aa9602283c588c61e | link | true | /test e2e-aws-ovn-fips |
| ci/prow/e2e-aws-ovn-serial | 054f30d2224d39fa81946f8aa9602283c588c61e | link | true | /test e2e-aws-ovn-serial |
| ci/prow/e2e-gcp-ovn-image-ecosystem | 054f30d2224d39fa81946f8aa9602283c588c61e | link | true | /test e2e-gcp-ovn-image-ecosystem |
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.
/close
@openshift-bot: Closed this PR.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen. Mark the issue as fresh by commenting/remove-lifecycle rotten. Exclude this issue from closing again by commenting/lifecycle frozen./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-lifecycle stale /reopen
@wking: Reopened this PR.
In response to this:
/remove-lifecycle stale /reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-lifecycle rotten
@wking: This pull request references OTA-362 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.
In response to this:
Filling in known bugs from here (~WIP: not actually full yet~). This allows us to fail new regressions, and gradually ratchet tighter as we close out existing issues.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/cc
Job Failure Risk Analysis for sha: 23301f18ea880511c05c6e94112d9756eabb953c
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-openstack-ovn | IncompleteTests Tests for this run (15) are below the historical average (1897): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-metal-ipi-sdn | IncompleteTests Tests for this run (14) are below the historical average (1558): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6 | IncompleteTests Tests for this run (14) are below the historical average (1514): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade | IncompleteTests Tests for this run (17) are below the historical average (733): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-gcp-ovn-rt-upgrade | IncompleteTests Tests for this run (17) are below the historical average (740): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-gcp-ovn | IncompleteTests Tests for this run (16) are below the historical average (1863): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-gcp-csi | IncompleteTests Tests for this run (16) are below the historical average (701): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade | IncompleteTests Tests for this run (18) are below the historical average (706): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade | IncompleteTests Tests for this run (17) are below the historical average (2057): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial | IncompleteTests Tests for this run (15) are below the historical average (708): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-aws-ovn-single-node | IncompleteTests Tests for this run (15) are below the historical average (1793): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-aws-ovn-serial | IncompleteTests Tests for this run (16) are below the historical average (778): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-aws-ovn-fips | IncompleteTests Tests for this run (16) are below the historical average (2006): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-aws-ovn-cgroupsv2 | IncompleteTests Tests for this run (15) are below the historical average (1970): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-aws-csi | IncompleteTests Tests for this run (16) are below the historical average (744): IncompleteTests |
| pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd | IncompleteTests Tests for this run (16) are below the historical average (669): IncompleteTests |
Job Failure Risk Analysis for sha: c07c721963505f6b1f9fd532ef0e938e16c74f0d
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-openstack-ovn | IncompleteTests Tests for this run (18) are below the historical average (1843): IncompleteTests |
Job Failure Risk Analysis for sha: 8e7818ace7e2fd009fbe9796ecebd19f71420a82
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-openstack-ovn | IncompleteTests Tests for this run (22) are below the historical average (1630): IncompleteTests |
Job Failure Risk Analysis for sha: b9240a4b627b4338fc4a27558e809d3f376b6d96
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-openstack-ovn | IncompleteTests Tests for this run (15) are below the historical average (1528): IncompleteTests |
Job Failure Risk Analysis for sha: 79212ee2f7c2351b4a9bcb34b9bf75178632c60d
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd | IncompleteTests Tests for this run (16) are below the historical average (551): IncompleteTests |
AWS OVN update failed an Available test-case:
: [bz-apiserver-auth] clusteroperator/authentication should not change condition/Available expand_less 2h25m38s
{ 2 unexpected clusteroperator state transitions during e2e test run. These did not match any known exceptions, so they cause this test-case to fail:
Nov 27 17:45:23.986 E clusteroperator/authentication condition/Available reason/APIServerDeployment_NoDeployment status/False APIServerDeploymentAvailable: deployment/openshift-oauth-apiserver: could not be retrieved
Nov 27 17:45:23.986 - 3s E clusteroperator/authentication condition/Available reason/APIServerDeployment_NoDeployment status/False APIServerDeploymentAvailable: deployment/openshift-oauth-apiserver: could not be retrieved
1 unwelcome but acceptable clusteroperator state transitions during e2e test run. These should not happen, but because they are tied to exceptions, the fact that they did happen is not sufficient to cause this test-case to fail:
Nov 27 17:45:27.420 W clusteroperator/authentication condition/Available reason/AsExpected status/True All is well (exception: Available=True is the happy case)
}
And we don't have an exception for APIServerDeployment_NoDeployment, so that looks like it's working. The run also flaked a Degraded test-case with We are not worried about Degraded=True blips for update tests yet, so that's also working:
: [bz-DNS] clusteroperator/dns should not change condition/Degraded expand_less
Run #0: Failed expand_less 2h25m38s
{ 0 unexpected clusteroperator state transitions during e2e test run, as desired.
6 unwelcome but acceptable clusteroperator state transitions during e2e test run. These should not happen, but because they are tied to exceptions, the fact that they did happen is not sufficient to cause this test-case to fail:
Nov 27 17:48:34.907 E clusteroperator/dns condition/Degraded reason/DNSDegraded status/True DNS default is degraded (exception: We are not worried about Degraded=True blips for update tests yet.)
Nov 27 17:48:34.907 - 42s E clusteroperator/dns condition/Degraded reason/DNSDegraded status/True DNS default is degraded (exception: We are not worried about Degraded=True blips for update tests yet.)
Nov 27 17:49:17.518 W clusteroperator/dns condition/Degraded reason/DNSNotDegraded status/False (exception: Degraded=False is the happy case)
Nov 27 18:43:16.191 E clusteroperator/dns condition/Degraded reason/DNSDegraded status/True DNS default is degraded (exception: We are not worried about Degraded=True blips for update tests yet.)
Nov 27 18:43:16.191 - 9s E clusteroperator/dns condition/Degraded reason/DNSDegraded status/True DNS default is degraded (exception: We are not worried about Degraded=True blips for update tests yet.)
Nov 27 18:43:26.190 W clusteroperator/dns condition/Degraded reason/DNSNotDegraded status/False (exception: Degraded=False is the happy case)
}
AWS OVN serial flaked a test-case with We are not worried about Available=False or Degraded=True blips for stable-system tests yet, so that looks good too:
: [bz-apiserver-auth] clusteroperator/authentication should not change condition/Available expand_less
Run #0: Failed expand_less 1h27m18s
{ 0 unexpected clusteroperator state transitions during e2e test run, as desired.
6 unwelcome but acceptable clusteroperator state transitions during e2e test run. These should not happen, but because they are tied to exceptions, the fact that they did happen is not sufficient to cause this test-case to fail:
Nov 27 17:02:30.828 E clusteroperator/authentication condition/Available reason/APIServices_PreconditionNotReady status/False APIServicesAvailable: PreconditionNotReady (exception: We are not worried about Available=False or Degraded=True blips for stable-system tests yet.)
Nov 27 17:02:30.828 - 105s E clusteroperator/authentication condition/Available reason/APIServices_PreconditionNotReady status/False APIServicesAvailable: PreconditionNotReady (exception: We are not worried about Available=False or Degraded=True blips for stable-system tests yet.)
...
/payload 4.15 nightly blocking
This all looks good to me so far, per slack I'd propose we just make sure we're not likely to take out the nightly payload, and then merge, monitor for what's failing it, and you can keep an eye out on your tool of choice for where it's failing.
if it helps, there is a small framework in sippy capable of extracting metadata out of test output. If you wanted to parse your output from this test and wind up with a json blob in the sippy db you could then query/group/aggregate with sql, we've used it in the past to find top offenders programmatically. https://github.com/openshift/sippy/blob/master/pkg/dataloader/prowloader/testoutputmetadata.go
@dgoodwin: trigger 8 job(s) of type blocking for the nightly release of OCP 4.15
- periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade
- periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade
- periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade
- periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn-conformance
- periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-serial
- periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade
- periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-ovn-ipv6
- periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-sdn-bm
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2493d350-8dec-11ee-95db-8de7120989e4-0
I've pushed 79212ee2f7 -> 1821445c79 with the following pivots:
- Including a JUnit for each operator/condition pair in the happy-case "no surprising blips", to avoid
we require at least 6 attempts to have a chance at successfailures in aggregate runs. - Expanded
reasonmatching forauthenticationandmonitoring, to cover thereasons seen recently in 4.15 update CI. - Fixed
reasonforoperator-lifecycle-manager-packageserver(ClusterServiceVersiontoClusterServiceVersionNotSucceeded). - New OCPBUGS-24041 matching some
consoleblips.
Once presubmits give positive signs, I'll launch a new round of blocker payload jobs.
/payload 4.15 nightly blocking
@wking: trigger 8 job(s) of type blocking for the nightly release of OCP 4.15
- periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade
- periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade
- periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade
- periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn-conformance
- periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-serial
- periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade
- periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-ovn-ipv6
- periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-sdn-bm
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/40875bb0-8e4e-11ee-959f-9c9eb36a842c-0
/payload 4.15 nightly blocking
@wking: trigger 8 job(s) of type blocking for the nightly release of OCP 4.15
- periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade
- periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade
- periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade
- periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn-conformance
- periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-serial
- periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade
- periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-ovn-ipv6
- periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-sdn-bm
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/afc00530-8e7c-11ee-9ec5-48e1a7780da5-0
Job Failure Risk Analysis for sha: ba523b5b481f02ee0bad1f60842838f0831cb9fb
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial | High [sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel] This test has passed 100.00% of 49 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-single-node-serial'] in the last 14 days. |
/payload 4.15 nightly blocking
@wking: trigger 8 job(s) of type blocking for the nightly release of OCP 4.15
- periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade
- periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade
- periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade
- periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn-conformance
- periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-serial
- periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade
- periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-ovn-ipv6
- periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-sdn-bm
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6833e8f0-8f42-11ee-9337-ff867cada0ed-0
/lgtm /hold Cancel whenever you're ready!
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: dgoodwin, wking
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~pkg/OWNERS~~ [dgoodwin]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

