configuration-anomaly-detection icon indicating copy to clipboard operation
configuration-anomaly-detection copied to clipboard

OSD-30030: E2E Test cases ClusterMonitoringErrorBudgetBurnSRE

Open lambasanchit opened this issue 6 months ago • 5 comments

E2E Test: ClusterMonitoringErrorBudgetBurn Alert Trigger and Recovery (OSD-30030)

Description: This PR adds an E2E test for the ClusterMonitoringErrorBudgetBurn alert, targeting AWS CCS clusters.

The test misconfigures the user-workload-monitoring-config ConfigMap in the openshift-user-workload-monitoring namespace to simulate excessive monitoring error budget burn. It then checks if a service log is created and finally deletes the ConfigMap to clean up the test state.

Steps: Fetch initial cluster info and current service logs.

Backup the original ConfigMap.

Inject malformed YAML to trigger the alert.

Wait for CAD/system reaction.

Validate that a new service log is generated.

Delete the ConfigMap as a recovery step.

Acceptance: Alert is triggered (ClusterMonitoringErrorBudgetBurnSRE).

Service log is sent to the customer.

Cluster state is restored by deleting the ConfigMap.

lambasanchit avatar May 27 '25 11:05 lambasanchit

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

openshift-ci[bot] avatar May 27 '25 11:05 openshift-ci[bot]

@lambasanchit: This pull request references OSD-30030 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

E2E Test: ClusterMonitoringErrorBudgetBurn Alert Trigger and Recovery (OSD-30030)

Description: This PR adds an E2E test for the ClusterMonitoringErrorBudgetBurn alert, targeting AWS CCS clusters.

The test misconfigures the user-workload-monitoring-config ConfigMap in the openshift-user-workload-monitoring namespace to simulate excessive monitoring error budget burn. It then checks if a service log is created and finally deletes the ConfigMap to clean up the test state.

Steps: Fetch initial cluster info and current service logs.

Backup the original ConfigMap.

Inject malformed YAML to trigger the alert.

Wait for CAD/system reaction.

Validate that a new service log is generated.

Delete the ConfigMap as a recovery step.

Acceptance: Alert is triggered (ClusterMonitoringErrorBudgetBurnSRE).

Service log is sent to the customer.

Cluster state is restored by deleting the ConfigMap.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar May 27 '25 11:05 openshift-ci-robot

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 31.92%. Comparing base (a59dd9a) to head (1eb302c). Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #456   +/-   ##
=======================================
  Coverage   31.92%   31.92%           
=======================================
  Files          36       36           
  Lines        2487     2487           
=======================================
  Hits          794      794           
  Misses       1632     1632           
  Partials       61       61           
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar May 27 '25 11:05 codecov-commenter

/retest

lambasanchit avatar Jun 04 '25 09:06 lambasanchit

/lgtm

bergmannf avatar Jun 04 '25 13:06 bergmannf

/label tide/merge-method-squash

lambasanchit avatar Jun 05 '25 06:06 lambasanchit

@lambasanchit: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Jun 05 '25 07:06 openshift-ci[bot]

/lgtm

bergmannf avatar Jun 05 '25 07:06 bergmannf

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bergmannf, lambasanchit

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Jun 05 '25 07:06 openshift-ci[bot]