origin
origin copied to clipboard
OCPBUGS-38859: add a test (that flakes) to detect faulty load balancer
Job Failure Risk Analysis for sha: 8afd3c8e370fb321e5f320d78569d51f6e9e3b56
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-kube-apiserver-rollout | Low operator conditions kube-apiserver This test has passed 68.75% of 16 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-kube-apiserver-rollout'] in the last 14 days. --- [sig-sippy] tests should finish with healthy operators This test has passed 68.75% of 16 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-kube-apiserver-rollout'] in the last 14 days. |
Job Failure Risk Analysis for sha: f5ea35e2f6aaeed00ae535a6f64f2effd468805c
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-kube-apiserver-rollout | Low [sig-sippy] tests should finish with healthy operators This test has passed 70.59% of 17 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-kube-apiserver-rollout'] in the last 14 days. --- operator conditions kube-apiserver This test has passed 70.59% of 17 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-kube-apiserver-rollout'] in the last 14 days. |
@tkashem: This pull request references Jira Issue OCPBUGS-38859, which is valid.
3 validation(s) were run on this bug
- bug is open, matching expected state (open)
- bug target version (4.18.0) matches configured target version for branch (4.18.0)
- bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Requesting review from QA contact: /cc @wangke19
The bug has been updated to refer to the pull request using the external bug tracker.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
/label acknowledge-critical-fixes-only
(it does not fail yet, it flakes only so we can measure and fix, once the fixes are made, we can change it to a test that fails)
/lgtm
I would just check that you can find passes and fails in the rehersals once they're in, but it looks good now.
/hold (until we see some passes in rehearsals)
/retest
Job Failure Risk Analysis for sha: 8e36b1b0f1b685a9ccfd5b1dfeca6089d1725d0c
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-kube-apiserver-rollout | Low [sig-sippy] tests should finish with healthy operators This test has passed 70.59% of 17 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-kube-apiserver-rollout'] in the last 14 days. --- operator conditions kube-apiserver This test has passed 70.59% of 17 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-kube-apiserver-rollout'] in the last 14 days. |
/test e2e-metal-ipi-ovn-kube-apiserver-rollout
Passed: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/29034/pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-kube-apiserver-rollout/1828905534703538176
There is one client error interval, but it does not overlap with any kube-apisererver shutdown interval
juni test output under "Tests Passed":
: [sig-apimachinery] new and reused connections to kube-apiserver should be handled gracefully during the graceful termination process
Skipped: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/29034/pull-ci-openshift-origin-master-e2e-aws-ovn-cgroupsv2/1828905495277080576 There are no kube-apiserver shutdown interval, and test log says
I0828 23:15:25.753028 309 monitortest.go:70] monitor[faulty-load-balancer]: found 0 interesting intervals, kube-apiserver shutdown interval count: 0
junit test ouput:
: [sig-apimachinery] new and reused connections to kube-apiserver should be handled gracefully during the graceful termination process
Reason: No kube-apiserver shutdown interval found
Flake: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/29034/pull-ci-openshift-origin-master-e2e-aws-ovn-kube-apiserver-rollout/1828905505343410176
monitor tests log:
I0829 00:35:50.831077 296 monitortest.go:70] monitor[faulty-load-balancer]: found 29 interesting intervals, kube-apiserver shutdown interval count: 14
junit output:
: [sig-apimachinery] new and reused connections to kube-apiserver should be handled gracefully during the graceful termination process
Run #0: Failed 0s
{
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 28 23:34:42.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-21-158.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 28 23:35:51.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 28 23:38:39.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-123-235.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 28 23:39:51.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 28 23:42:34.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-124-51.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 28 23:43:51.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 28 23:47:15.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-21-158.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 28 23:47:51.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 28 23:51:15.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-123-235.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 28 23:51:51.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 28 23:55:07.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-124-51.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 28 23:55:51.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 28 23:59:51.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-21-158.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 29 00:00:51.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 29 00:03:42.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-123-235.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 29 00:04:51.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 29 00:07:40.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-124-51.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 29 00:08:51.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 29 00:12:33.000 - 132s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-21-158.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 29 00:13:51.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 29 00:16:32.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-123-235.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 29 00:17:51.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 29 00:20:27.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-124-51.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 29 00:21:21.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 29 00:25:19.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-21-158.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 29 00:26:21.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
client observed connection error during kube-apiserver rollout, type: internal-lb
kube-apiserver: Aug 29 00:29:21.000 - 131s I namespace/openshift-kube-apiserver node/ pod/kube-apiserver-ip-10-0-123-235.us-east-2.compute.internal server/kube-apiserver constructed/graceful-shutdown-analyzer reason/GracefulAPIServerShutdown
client: Aug 29 00:30:21.046 - 60s E host/internal-lb reason/APIUnreachableFromClientMetrics client observed API error(s), host: api-int.ci-op-rtzldrxw-af546.aws-2.ci.openshift.org:6443, duration: 1m0s
}
Run #1: Passed
/hold cancel
/retest-required
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: dgoodwin, sanchezl, tkashem
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [dgoodwin]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/retest-required
Remaining retests: 0 against base HEAD 458e1eae65b16ca8d4f12f387ec2ae34a9ba7591 and 2 for PR HEAD cbc62c55589992d00643455f45eebf2fe798bd8f in total
/retest-required
Remaining retests: 0 against base HEAD a993c78e79f552ce8b6f5ff4c6f66ae9fbf8a0d4 and 2 for PR HEAD cbc62c55589992d00643455f45eebf2fe798bd8f in total
/retest-required
Remaining retests: 0 against base HEAD a993c78e79f552ce8b6f5ff4c6f66ae9fbf8a0d4 and 2 for PR HEAD cbc62c55589992d00643455f45eebf2fe798bd8f in total
/retest-required
@tkashem: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| ci/prow/e2e-aws-ovn-single-node | cbc62c55589992d00643455f45eebf2fe798bd8f | link | false | /test e2e-aws-ovn-single-node |
| ci/prow/e2e-aws-ovn-ipsec-serial | cbc62c55589992d00643455f45eebf2fe798bd8f | link | false | /test e2e-aws-ovn-ipsec-serial |
| ci/prow/e2e-aws-ovn-single-node-upgrade | cbc62c55589992d00643455f45eebf2fe798bd8f | link | false | /test e2e-aws-ovn-single-node-upgrade |
| ci/prow/e2e-gcp-ovn-rt-upgrade | cbc62c55589992d00643455f45eebf2fe798bd8f | link | false | /test e2e-gcp-ovn-rt-upgrade |
| ci/prow/e2e-aws-ovn-upgrade | cbc62c55589992d00643455f45eebf2fe798bd8f | link | false | /test e2e-aws-ovn-upgrade |
| ci/prow/e2e-aws-ovn-cgroupsv2 | cbc62c55589992d00643455f45eebf2fe798bd8f | link | false | /test e2e-aws-ovn-cgroupsv2 |
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
Job Failure Risk Analysis for sha: cbc62c55589992d00643455f45eebf2fe798bd8f
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial | High [sig-arch] events should not repeat pathologically for ns/openshift-authentication-operator This test has passed 100.00% of 34 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days. --- [bz-Monitoring] clusteroperator/monitoring should not change condition/Available This test has passed 100.00% of 34 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days. Open Bugs monitoring ClusterOperator should not blip Available=Unknown on client rate limiter |
/retest-required
Remaining retests: 0 against base HEAD 8d619a5336c57e4f51efb731a5406efe95f52c1c and 2 for PR HEAD cbc62c55589992d00643455f45eebf2fe798bd8f in total
/retest-required
Remaining retests: 0 against base HEAD f1ade5751b9d643a9cc1c61e40046cadcd45dd94 and 2 for PR HEAD cbc62c55589992d00643455f45eebf2fe798bd8f in total
@tkashem: Jira Issue OCPBUGS-38859: All pull requests linked via external trackers have merged:
Jira Issue OCPBUGS-38859 has been moved to the MODIFIED state.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
[ART PR BUILD NOTIFIER]
Distgit: openshift-enterprise-tests This PR has been included in build openshift-enterprise-tests-container-v4.18.0-202408301641.p0.g1ce76da.assembly.stream.el9. All builds following this will include this PR.