origin icon indicating copy to clipboard operation
origin copied to clipboard

Bug 2106216: Reduce expected success rate to =>98%

Open suleymanakbas91 opened this issue 3 years ago • 14 comments

We initially came up with 99% success rate expectancy as a guess as we didn't have any historical data to compare with. However, this test has been failing mostly with success rates greater than 98% but less than 99%: https://search.ci.openshift.org/?search=success+rate+is+less+than+99%25+on+the+node&maxAge=336h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job So, changing the expected success rate in the test to >= 98% will reduce CI failures.

suleymanakbas91 avatar Aug 02 '22 11:08 suleymanakbas91

@suleymanakbas91: This pull request references Bugzilla bug 2106216, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.12.0) matches configured target release for branch (4.12.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact: /cc @lihongan

In response to this:

Bug 2106216: Reduce expected success rate to 98%

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Aug 02 '22 11:08 openshift-ci[bot]

/retest

brandisher avatar Aug 02 '22 16:08 brandisher

/lgtm

brandisher avatar Aug 02 '22 19:08 brandisher

/retest

brandisher avatar Aug 02 '22 19:08 brandisher

/retest

suleymanakbas91 avatar Aug 03 '22 00:08 suleymanakbas91

/retest

suleymanakbas91 avatar Aug 03 '22 07:08 suleymanakbas91

@suleymanakbas91: This pull request references Bugzilla bug 2106216, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.12.0) matches configured target release for branch (4.12.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact: /cc @lihongan

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Aug 03 '22 11:08 openshift-ci[bot]

/retest

suleymanakbas91 avatar Aug 03 '22 12:08 suleymanakbas91

/retest

suleymanakbas91 avatar Aug 04 '22 09:08 suleymanakbas91

/retest

suleymanakbas91 avatar Aug 09 '22 11:08 suleymanakbas91

/retest

suleymanakbas91 avatar Aug 10 '22 13:08 suleymanakbas91

I think this is reasonable. I do want to be careful about "cooking the books" with this, but it sounds like the argument is 99% was arbitrary, so 98% is now an established baseline. I'll buy that. /lgtm

gcs278 avatar Aug 10 '22 16:08 gcs278

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: brandisher, gcs278, suleymanakbas91 Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval by writing /assign @deads2k in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Aug 10 '22 16:08 openshift-ci[bot]

@suleymanakbas91: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-single-node-upgrade fa7ceb685ee87d880f6a18319393ca05e978ce18 link false /test e2e-aws-single-node-upgrade
ci/prow/e2e-aws-single-node fa7ceb685ee87d880f6a18319393ca05e978ce18 link false /test e2e-aws-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Aug 10 '22 17:08 openshift-ci[bot]

/hold

it's been a month since the the bug was filed about this test failing. I don't see analysis in the bug indicating what is causing the failures or an explanation that indicates we advertise DNS as less than fully available in our clusters. I'm sympathetic to wanting jobs to pass, but I don't want to ignore product problems. How about an analysis of why we are experiencing this problem that explains why it is an expected part of the product before reducing reliability standards.

deads2k avatar Aug 11 '22 12:08 deads2k

@suleymanakbas91: This pull request references Bugzilla bug 2106216. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state. Warning: Failed to comment on Bugzilla bug with reason for changed state.

In response to this:

Bug 2106216: Reduce expected success rate to =>98%

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Aug 24 '22 14:08 openshift-ci[bot]