Overriding The PrometheusRule Objects Alerts
1. Quick Debug Checklist
- Run on OpenShift v4.10.16
- GPU operator version v22.9.1
2. Issue or feature description
The GPU operator currently arrives with 2 PrometheusRule objects: nvidia-gpu-operator-metrics and nvidia-node-status-exporter-alerts.
All the alerts in the PrometheusRules objects I mentioned are with the label severity: warning.
In my Grafana I use a dashboard which contains only high or critical alerts so I tried to increase the severity of the alerts but the operator's ClusterPolicy object is the one who manages the PromethuesRule objects and after I applied the changes it reverted it.
Is there any best practice for overriding/changing the PrometheusRule objects' labels?
2. Steps to reproduce the issue
- Run:
oc edit prometheusrule nvidia-gpu-operator-metrics -n nvidia-gpu-operator - Replace any label of
severity: warningtoseverity: high - Save & exit
- Wait till the object ClusterPolicy will return the original configuration for the PrometheusRule
@guyst16 currently we don't support changing these but you can create custom rules based on the ones provided by the operator. Will also look into allowing this change with the operator.