origin icon indicating copy to clipboard operation
origin copied to clipboard

Add tests verifying that alerts expressions are valid

Open dgrisonnet opened this issue 3 years ago • 6 comments

Add a test verifying that the underlying expressions of an alert are still relevant. The goal behind such tests is to detect alerts that are out-of-sync with the platform, whether it is because a metric was removed or one of its labels was changed. To do so, we want to make sure that all the metrics and their selectors aren't absent. However, this comes with the trade-off that we won't be able to detect if an error-related alert is still relevant since in most cases, error metrics are absent if no error occurred.

One use case that we recently ran into is when the buckets of a metric changed and an alert using them hasn't been updated. In that case, we want to detect that the alert isn't doing anything anymore in order to fix it.

dgrisonnet avatar Jan 26 '22 14:01 dgrisonnet

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dgrisonnet To complete the pull request process, please assign stbenjam after the PR has been reviewed. You can assign the PR to them by writing /assign @stbenjam in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Jan 26 '22 14:01 openshift-ci[bot]

/retest

dgrisonnet avatar Jan 31 '22 15:01 dgrisonnet

@openshift/openshift-team-monitoring could you please have a look at this PR?

dgrisonnet avatar Feb 01 '22 19:02 dgrisonnet

@dgrisonnet: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar May 10 '22 18:05 openshift-ci[bot]

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Aug 08 '22 19:08 openshift-bot

@dgrisonnet: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test images
ci/prow/e2e-aws-serial 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-aws-serial
ci/prow/e2e-gcp 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp
ci/prow/e2e-aws-cgroupsv2 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link false /test e2e-aws-cgroupsv2
ci/prow/e2e-aws-fips 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-aws-fips
ci/prow/e2e-gcp-builds 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp-builds
ci/prow/e2e-gcp-upgrade 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp-upgrade
ci/prow/e2e-agnostic-cmd 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link false /test e2e-agnostic-cmd
ci/prow/e2e-aws-single-node 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link false /test e2e-aws-single-node
ci/prow/e2e-aws-csi 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link false /test e2e-aws-csi
ci/prow/verify 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test verify
ci/prow/e2e-gcp-csi 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link false /test e2e-gcp-csi
ci/prow/lint 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test lint
ci/prow/e2e-aws-ovn-serial 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-aws-ovn-serial
ci/prow/e2e-aws-ovn-fips 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-aws-ovn-fips
ci/prow/e2e-gcp-ovn 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp-ovn
ci/prow/e2e-gcp-ovn-upgrade 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Aug 31 '22 17:08 openshift-ci[bot]

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Oct 01 '22 00:10 openshift-bot

@dgrisonnet: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test images
ci/prow/e2e-aws-serial 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-aws-serial
ci/prow/e2e-gcp 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp
ci/prow/e2e-aws-cgroupsv2 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link false /test e2e-aws-cgroupsv2
ci/prow/e2e-aws-fips 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-aws-fips
ci/prow/e2e-gcp-builds 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp-builds
ci/prow/e2e-gcp-upgrade 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp-upgrade
ci/prow/e2e-agnostic-cmd 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link false /test e2e-agnostic-cmd
ci/prow/e2e-aws-single-node 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link false /test e2e-aws-single-node
ci/prow/e2e-aws-csi 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link false /test e2e-aws-csi
ci/prow/verify 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test verify
ci/prow/e2e-gcp-csi 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link false /test e2e-gcp-csi
ci/prow/lint 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test lint
ci/prow/e2e-aws-ovn-serial 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-aws-ovn-serial
ci/prow/e2e-aws-ovn-fips 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-aws-ovn-fips
ci/prow/e2e-gcp-ovn 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp-ovn
ci/prow/e2e-gcp-ovn-upgrade 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp-ovn-upgrade
ci/prow/unit 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test unit
ci/prow/e2e-gcp-ovn-image-ecosystem 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp-ovn-image-ecosystem
ci/prow/e2e-aws-ovn-image-registry 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-aws-ovn-image-registry
ci/prow/e2e-gcp-ovn-builds 275b8ba1e1a3483c02e548fcc64ca4b0d4801015 link true /test e2e-gcp-ovn-builds

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Nov 04 '22 22:11 openshift-ci[bot]

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot avatar Dec 05 '22 08:12 openshift-bot

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Dec 05 '22 08:12 openshift-ci[bot]