origin icon indicating copy to clipboard operation
origin copied to clipboard

OCPBUGS-14057: Removes HAProxyDown critical alert exception.

Open miheer opened this issue 1 year ago • 15 comments

Removes HAProxyDown critical alert exception. Ticket: https://issues.redhat.com/browse/OCPBUGS-14057

miheer avatar Feb 06 '24 00:02 miheer

/jira-refresh

miheer avatar Feb 06 '24 00:02 miheer

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: miheer Once this PR has been reviewed and has the lgtm label, please assign slashpai for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Feb 06 '24 00:02 openshift-ci[bot]

@miheer: This pull request references Jira Issue OCPBUGS-14057, which is invalid:

  • expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Removes HAProxyDown critical alert exception. Ticket: https://issues.redhat.com/browse/OCPBUGS-14057

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Feb 06 '24 00:02 openshift-ci-robot

/jira refresh

miheer avatar Feb 06 '24 00:02 miheer

@miheer: This pull request references Jira Issue OCPBUGS-14057, which is invalid:

  • expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Feb 06 '24 00:02 openshift-ci-robot

@miheer: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn 67f5bd80550d217a11c99023ca08f5e27933e0d3 link true /test e2e-gcp-ovn
ci/prow/e2e-aws-ovn-fips 67f5bd80550d217a11c99023ca08f5e27933e0d3 link true /test e2e-aws-ovn-fips
ci/prow/e2e-openstack-ovn 67f5bd80550d217a11c99023ca08f5e27933e0d3 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node 67f5bd80550d217a11c99023ca08f5e27933e0d3 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-serial 67f5bd80550d217a11c99023ca08f5e27933e0d3 link true /test e2e-aws-ovn-serial
ci/prow/e2e-aws-ovn-single-node-serial 67f5bd80550d217a11c99023ca08f5e27933e0d3 link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-metal-ipi-sdn 67f5bd80550d217a11c99023ca08f5e27933e0d3 link false /test e2e-metal-ipi-sdn
ci/prow/e2e-aws-ovn-single-node-upgrade 67f5bd80550d217a11c99023ca08f5e27933e0d3 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-cgroupsv2 67f5bd80550d217a11c99023ca08f5e27933e0d3 link false /test e2e-aws-ovn-cgroupsv2
ci/prow/e2e-agnostic-ovn-cmd 67f5bd80550d217a11c99023ca08f5e27933e0d3 link false /test e2e-agnostic-ovn-cmd

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Feb 06 '24 03:02 openshift-ci[bot]

/hold until https://github.com/openshift/runbooks/pull/166/ is merged.

jan--f avatar Feb 06 '24 14:02 jan--f

/jira refresh

Miciah avatar Mar 18 '24 15:03 Miciah

@Miciah: This pull request references Jira Issue OCPBUGS-14057, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @ShudiLi

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Mar 18 '24 15:03 openshift-ci-robot

/assign @Miciah /assign

candita avatar Mar 27 '24 15:03 candita

tested it with 4.16.0-0.ci.test-2024-05-15-084545-ci-ln-x0xpjtt-latest, when the haproxy was down, the log could be shown in the web console 1. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.16.0-0.ci.test-2024-05-15-084545-ci-ln-x0xpjtt-latest True False 111m Cluster version is 4.16.0-0.ci.test-2024-05-15-084545-ci-ln-x0xpjtt-latest

%oc scale --replicas 0 -n openshift-cluster-version deployments/cluster-version-operator %oc scale --replicas 0 -n openshift-ingress-operator deployments ingress-operator

  1. edit the router-default deployment to removing the livenessProbe check and startupProbe check

  2. rsh to a router pod and kill the haproxy progress id % oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-595f85875f-2j5jr 1/1 Running 0 73m router-default-595f85875f-8glrr 1/1 Running 0 73m

  3. login the web console, Observer >> Alerting, can see the HAProxyDown log Name HAProxyDown

Description This alert fires when metrics report that HAProxy is down.

Summary HAProxy is down

Runbook https://github.com/openshift/runbooks/blob/master/alerts/HAProxyDown.md

Labels prometheus=openshift-monitoring/k8s severity=critical alertname=HAProxyDown pod=router-default-595f85875f-8glrr

ShudiLi avatar May 15 '24 11:05 ShudiLi

/label qe-approved thanks

ShudiLi avatar May 15 '24 11:05 ShudiLi

@miheer: This pull request references Jira Issue OCPBUGS-14057, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @ShudiLi

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Removes HAProxyDown critical alert exception. Ticket: https://issues.redhat.com/browse/OCPBUGS-14057

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar May 15 '24 11:05 openshift-ci-robot

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Aug 14 '24 01:08 openshift-bot