origin
origin copied to clipboard
NO-JIRA: add watch machines monitor test for phase changes of machines
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: kannon92 Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/cc @elmiko @deads2k
/hold
Going to make sure intervals are created before I merge.
I will also use these intervals to fix the unexpected node not ready failures but I wanted to debug these monitor tests first.
@kannon92: This pull request explicitly references no jira issue.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
Intervals exist so pushed up a fix to filter out these unexpected events if they fall within a machine phase change.
Job Failure Risk Analysis for sha: 16567582d901c5f5074476064f3fe2e5e3554834
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade | Low [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers for ns/openshift-marketplace This test has passed 37.96% of 108 runs on release 4.18 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:single Upgrade:micro] in the last week. |
Job Failure Risk Analysis for sha: cac5fac5fd081462bb0a9d36157467b975d4e74e
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade | IncompleteTests |
/retest-required
/hold cancel
Job Failure Risk Analysis for sha: 68812179e3bf6add5fd721c895ba7cd955786418
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade | IncompleteTests Tests for this run (104) are below the historical average (820): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems) |
| pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-kube-apiserver-rollout | Low operator conditions kube-apiserver This test has passed 60.00% of 15 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-kube-apiserver-rollout'] in the last 14 days. Open Bugs operator conditions kube-apiserver failures --- [sig-sippy] tests should finish with healthy operators This test has passed 60.00% of 15 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-kube-apiserver-rollout'] in the last 14 days. |
/retest-required
/retest
Job Failure Risk Analysis for sha: e0912f9dd628ea6d108d2e14baa778f47dec4ef9
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-kube-apiserver-rollout | High [sig-network] there should be nearly zero single second disruptions for openshift-api-http2-internal-lb-reused-connections This test has passed 100.00% of 1 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-kube-apiserver-rollout'] in the last 14 days. |
| pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade | Medium [sig-network] pods should successfully create sandboxes by adding pod to network This test has passed 91.67% of 120 runs on release 4.18 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:single Upgrade:micro] in the last week. Open Bugs s390x: [sig-network] pods should successfully create sandboxes by adding pod to network fails with error adding pod to CNI network |
Job Failure Risk Analysis for sha: 9fa46c5d27f4589c034499f05c1852a446e1ffca
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial | High [bz-Management Console] clusteroperator/console should not change condition/Available This test has passed 100.00% of 39 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days. --- [sig-arch] events should not repeat pathologically for ns/openshift-ovn-kubernetes This test has passed 100.00% of 39 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days. --- [sig-arch] events should not repeat pathologically This test has passed 100.00% of 39 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days. --- [sig-arch] events should not repeat pathologically for ns/openshift-authentication-operator This test has passed 100.00% of 39 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days. --- Showing 4 of 5 test results |
capacity issues
/retest
@kannon92: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| ci/prow/e2e-aws-ovn-single-node-upgrade | 37d75072920c45286cb91a1e89cd283465cfeba2 | link | false | /test e2e-aws-ovn-single-node-upgrade |
| ci/prow/e2e-metal-ipi-ovn | 37d75072920c45286cb91a1e89cd283465cfeba2 | link | false | /test e2e-metal-ipi-ovn |
| ci/prow/e2e-aws-ovn-ipsec-serial | 37d75072920c45286cb91a1e89cd283465cfeba2 | link | false | /test e2e-aws-ovn-ipsec-serial |
| ci/prow/e2e-aws-ovn-kube-apiserver-rollout | 37d75072920c45286cb91a1e89cd283465cfeba2 | link | false | /test e2e-aws-ovn-kube-apiserver-rollout |
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
@deads2k This is working. Can we go ahead and merge?
@deads2k This is working. Can we go ahead and merge?
the timelines need to not end up at the beginning of the epoch
@deads2k This is working. Can we go ahead and merge?
the timelines need to not end up at the beginning of the epoch
Yes. I created https://issues.redhat.com/browse/TRT-1798 to have TRT look into.
I thought that this was a render issue. The intervals seem fine in the files.
"message": {
"reason": "MachinePhaseChange",
"cause": "",
"humanMessage": "Machine phase changed from \u003cmissing\u003e to Provisioning",
"annotations": {
"node": "\u003cunknown\u003e",
"phase": "Provisioning",
"previousPhase": "\u003cmissing\u003e",
"reason": "MachinePhaseChange"
}
},
"from": "2024-09-04T19:59:05Z",
"to": "2024-09-04T19:59:05Z"
},
This From/To is not at the end.
/close
@deads2k proposed an improve that hopefully fixes the bound machine problem.
/close
@kannon92: Closed this PR.
In response to this:
/close
@deads2k proposed an improve that hopefully fixes the bound machine problem.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.