origin icon indicating copy to clipboard operation
origin copied to clipboard

OCPBUGS-43483: improve test/apiserver/graceful_termination

Open p0lyn0mial opened this issue 1 year ago • 10 comments
trafficstars

This PR improves API LBs follow /readyz of kube-apiserver and stop sending requests before server shutdowns for external clients" test. In particular:

  1. Processes all available audit logs not just the last one.
  2. Doesn't prematurely close the audit logs file so that the entire file can be processed.
  3. Checks scanner.Err()
  4. Ensures that opened files are always closed even if the test fails in the middle.

Note that given that the audit logs were not fully processed before this PR we might start seeing some failures.

p0lyn0mial avatar Aug 14 '24 06:08 p0lyn0mial

/assign @tkashem

p0lyn0mial avatar Aug 14 '24 07:08 p0lyn0mial

@p0lyn0mial: This pull request explicitly references no jira issue.

In response to this:

This PR improves API LBs follow /readyz of kube-apiserver and stop sending requests before server shutdowns for external clients" test. In particular:

  1. Processes all available audit logs not just the last one.
  2. Doesn't prematurely close the audit logs file so that the entire file can be processed.
  3. Checks scanner.Err()
  4. Ensures that opened files are always closed even if the test fails in the middle.

Note that given that the audit logs were not fully processed before this PR we might start seeing some failures.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Aug 14 '24 07:08 openshift-ci-robot

/test all

p0lyn0mial avatar Sep 09 '24 12:09 p0lyn0mial

Please move to a monitortest since monitor tests are run for all job types and are able to create intervals for consumption by other analysis tools (this cannot do so efficiently).

deads2k avatar Sep 17 '24 13:09 deads2k

Please move to a monitortest since monitor tests are run for all job types and are able to create intervals for consumption by other analysis tools (this cannot do so efficiently).

This PR fixes the existing test, and moving the code would require creating a new test. While it may not fully align with the monitor test framework, having a test in place is better than having no test at all, as it still provides some coverage and ensures the functionality is being validated.

p0lyn0mial avatar Sep 18 '24 13:09 p0lyn0mial

/retitle OCPBUGS-43483: improve test/apiserver/graceful_termination

I think we still want this, the test can be converted into monitortest in a separate PR

vrutkovs avatar Oct 17 '24 06:10 vrutkovs

@p0lyn0mial: This pull request references Jira Issue OCPBUGS-43483, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @wangke19

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This PR improves API LBs follow /readyz of kube-apiserver and stop sending requests before server shutdowns for external clients" test. In particular:

  1. Processes all available audit logs not just the last one.
  2. Doesn't prematurely close the audit logs file so that the entire file can be processed.
  3. Checks scanner.Err()
  4. Ensures that opened files are always closed even if the test fails in the middle.

Note that given that the audit logs were not fully processed before this PR we might start seeing some failures.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Oct 17 '24 06:10 openshift-ci-robot

/test e2e-metal-ipi-ovn-kube-apiserver-rollout e2e-aws-ovn-kube-apiserver-rollout

vrutkovs avatar Oct 17 '24 07:10 vrutkovs

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Jan 16 '25 01:01 openshift-bot

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-merge-robot avatar Jan 16 '25 01:01 openshift-merge-robot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Mar 29 '25 00:03 openshift-bot

@p0lyn0mial: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node-upgrade ea4355211bd00387714f640c709b7e15f6a982b8 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-single-node-serial ea4355211bd00387714f640c709b7e15f6a982b8 link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-aws-ovn-ipsec-serial ea4355211bd00387714f640c709b7e15f6a982b8 link false /test e2e-aws-ovn-ipsec-serial
ci/prow/e2e-agnostic-ovn-cmd ea4355211bd00387714f640c709b7e15f6a982b8 link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-gcp-ovn-builds ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-gcp-ovn-builds
ci/prow/e2e-aws-ovn-kube-apiserver-rollout ea4355211bd00387714f640c709b7e15f6a982b8 link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-aws-ovn-microshift ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-aws-ovn-microshift
ci/prow/images ea4355211bd00387714f640c709b7e15f6a982b8 link true /test images
ci/prow/e2e-metal-ipi-ovn-ipv6 ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-aws-ovn-fips ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-aws-ovn-fips
ci/prow/unit ea4355211bd00387714f640c709b7e15f6a982b8 link true /test unit
ci/prow/e2e-vsphere-ovn ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-vsphere-ovn
ci/prow/e2e-aws-ovn-serial ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-aws-ovn-serial
ci/prow/verify ea4355211bd00387714f640c709b7e15f6a982b8 link true /test verify
ci/prow/e2e-gcp-ovn-upgrade ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-gcp-ovn-upgrade
ci/prow/e2e-aws-ovn-edge-zones ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-aws-ovn-edge-zones
ci/prow/e2e-aws-ovn-microshift-serial ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-aws-ovn-microshift-serial
ci/prow/lint ea4355211bd00387714f640c709b7e15f6a982b8 link true /test lint
ci/prow/verify-deps ea4355211bd00387714f640c709b7e15f6a982b8 link true /test verify-deps
ci/prow/e2e-vsphere-ovn-upi ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-vsphere-ovn-upi
ci/prow/okd-scos-images ea4355211bd00387714f640c709b7e15f6a982b8 link true /test okd-scos-images
ci/prow/e2e-gcp-ovn ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-gcp-ovn
ci/prow/e2e-aws-ovn-serial-1of2 ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-aws-ovn-serial-1of2
ci/prow/e2e-aws-ovn-serial-2of2 ea4355211bd00387714f640c709b7e15f6a982b8 link true /test e2e-aws-ovn-serial-2of2

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar May 02 '25 20:05 openshift-ci[bot]

Job Failure Risk Analysis for sha: ea4355211bd00387714f640c709b7e15f6a982b8

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 IncompleteTests
Tests for this run (2) are below the historical average (1133): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (2) are below the historical average (1055): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt[bot] avatar May 02 '25 21:05 openshift-trt[bot]

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot avatar Jun 02 '25 00:06 openshift-bot

@p0lyn0mial: An error was encountered getting issue for bug OCPBUGS-43483 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. No response returned: Get "https://issues.redhat.com/rest/api/2/issue/OCPBUGS-43483": GET https://issues.redhat.com/rest/api/2/issue/OCPBUGS-43483 giving up after 5 attempt(s)

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

In response to this:

This PR improves API LBs follow /readyz of kube-apiserver and stop sending requests before server shutdowns for external clients" test. In particular:

  1. Processes all available audit logs not just the last one.
  2. Doesn't prematurely close the audit logs file so that the entire file can be processed.
  3. Checks scanner.Err()
  4. Ensures that opened files are always closed even if the test fails in the middle.

Note that given that the audit logs were not fully processed before this PR we might start seeing some failures.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Jun 02 '25 00:06 openshift-ci-robot

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Jun 02 '25 00:06 openshift-ci[bot]