origin icon indicating copy to clipboard operation
origin copied to clipboard

OCPBUGS-1052: allow frontends to tolerate 2.5% disruption during upgrades

Open joelanford opened this issue 3 years ago • 6 comments

Recent 4.8 to 4.9 upgrade tests fail fairly consistently due to cluster frontend ingress being unavailable. Currently the upgrade test tolerates no disruption whatsoever. This PR makes the test more lenient, allowing 2.5% disruption, which is the minimum that would have resulted in passes for the string of recent failures.

Note for reviewer: this is outside my typical domain, so if there's a more targeted fix we could make (e.g. do this only on AWS upgrades), or we think its worth root-causing and/or adding a link in the test output to help our future selves understand this decision, let me know what's appropriate.

Signed-off-by: Joe Lanford [email protected]

joelanford avatar Sep 21 '22 20:09 joelanford

@joelanford: No Bugzilla bug is referenced in the title of this pull request. To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

OCPBUGS-1052: allow frontends to tolerate 2.5% disruption during upgrades

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Sep 21 '22 20:09 openshift-ci[bot]

@joelanford: This pull request references Jira Issue OCPBUGS-1052, which is invalid:

  • expected the bug to target the "4.9.z" version, but no target version was set
  • expected Jira Issue OCPBUGS-1052 to depend on a bug targeting a version in 4.10.0, 4.10.z and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Recent 4.8 to 4.9 upgrade tests fail fairly consistently due to cluster frontend ingress being unavailable. Currently the upgrade test tolerates no disruption whatsoever. This PR makes the test more lenient, allowing 2.5% disruption, which is the minimum that would have resulted in passes for the string of recent failures.

Note for reviewer: this is outside my typical domain, so if there's a more targeted fix we could make (e.g. do this only on AWS upgrades), or we think its worth root-causing and/or adding a link in the test output to help our future selves understand this decision, let me know what's appropriate.

Signed-off-by: Joe Lanford [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot avatar Sep 21 '22 20:09 openshift-ci-robot

/jira refresh

joelanford avatar Sep 21 '22 21:09 joelanford

@joelanford: This pull request references Jira Issue OCPBUGS-1052, which is invalid:

  • expected Jira Issue OCPBUGS-1052 to depend on a bug targeting a version in 4.10.0, 4.10.z and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot avatar Sep 21 '22 21:09 openshift-ci-robot

/retest-required

joelanford avatar Sep 22 '22 20:09 joelanford

/skip /retest-required

wking avatar Sep 23 '22 07:09 wking

/retest-required

joelanford avatar Sep 27 '22 15:09 joelanford

/test e2e-aws-single-node-serial

joelanford avatar Sep 27 '22 19:09 joelanford

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: joelanford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Sep 27 '22 21:09 openshift-ci[bot]

Accepting this test threshold change directly into 4.9 as these tests have been reimplemented in later releases.

sdodson avatar Sep 27 '22 21:09 sdodson

/retest-required

Prashanth684 avatar Sep 27 '22 21:09 Prashanth684

/retest-required

Remaining retests: 0 against base HEAD 6ec9a5eb8140ffa64bfb026e763dc77be80fde36 and 2 for PR HEAD 89ca4287e92566b09876a3b46b49b3881c5bd9e2 in total

openshift-ci-robot avatar Sep 28 '22 00:09 openshift-ci-robot

/retest-required

Prashanth684 avatar Sep 28 '22 14:09 Prashanth684

/retest-required

joelanford avatar Sep 28 '22 17:09 joelanford

/retest-required

Prashanth684 avatar Sep 29 '22 15:09 Prashanth684

/retest-required

Prashanth684 avatar Sep 30 '22 14:09 Prashanth684

@joelanford: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-builds 89ca4287e92566b09876a3b46b49b3881c5bd9e2 link true /test e2e-gcp-builds

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Sep 30 '22 17:09 openshift-ci[bot]

/override ci/prow/e2e-gcp-builds

sdodson avatar Sep 30 '22 17:09 sdodson

@sdodson: sdodson unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight.

In response to this:

/override ci/prow/e2e-gcp-builds

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Sep 30 '22 17:09 openshift-ci[bot]

This test does not seem likely to be affected by this change, we will investigate to make sure that there's a bug filed to track looking into it.

sdodson avatar Sep 30 '22 17:09 sdodson

@joelanford: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-1052 has been moved to the MODIFIED state.

In response to this:

Recent 4.8 to 4.9 upgrade tests fail fairly consistently due to cluster frontend ingress being unavailable. Currently the upgrade test tolerates no disruption whatsoever. This PR makes the test more lenient, allowing 2.5% disruption, which is the minimum that would have resulted in passes for the string of recent failures.

Note for reviewer: this is outside my typical domain, so if there's a more targeted fix we could make (e.g. do this only on AWS upgrades), or we think its worth root-causing and/or adding a link in the test output to help our future selves understand this decision, let me know what's appropriate.

Signed-off-by: Joe Lanford [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot avatar Sep 30 '22 17:09 openshift-ci-robot