viya4-monitoring-kubernetes icon indicating copy to clipboard operation
viya4-monitoring-kubernetes copied to clipboard

How can i disable and create new monitoring rule and alert

Open sangram23 opened this issue 3 years ago • 5 comments

I am trying to configure custom alert such “Two Cas pods should not run on same worker node”. Using below step to complete the same but unable to configure new alert rule :

1)- Created new alter rule :

additionalPrometheusRules: ----- ####i have tried with to add and remove this line

  • name: CustomeRules groups:
    • name: CAS-Alert-Rule rules:
      • alert: CASPodsNode expr: topk(1,sort_desc((count (kube_pod_info{created_by_kind="CASDeployment"}) by (node)))) == 1 for: 5m labels: severity: critical annotations: summary: "More than One CAS Node Are running on Same Node: (instance {{ $labels.node }})" description: "Two CAS pods got Assgined to Same node. Could be cause Perfomeance Issue"

2)- Add line in rules in user-values-prom-operator.yaml pormetheus section

3)- Run monitoring/bin/deploy_monitoring_cluster.sh Script to apply the changes .

Unfortunately, not working for me. Would be helpful if you could provide any assistance in this.

sangram23 avatar Jun 08 '21 16:06 sangram23

A better approach may be to create a standalone PrometheusRule yaml file.

I didn't configure my environment to actually trigger the alert, but I got the alert defined with this yaml: `apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: cas-rules spec: groups:

  • name: cas-rules-group rules:
    • alert: multiple-cas-workers-on-node expr: count(kube_pod_info{created_by_kind="CASDeployment"}) by (node) > 1 for: 5m labels: severity: critical annotations: summary: "More than one CAS pod is running on the same node: (instance {{ $labels.node }})" message: "More than one CAS pod is assgined to a single node. This could cause performance issues."`

All I did was kubectl apply -n monitoring -f [path/to/theFile.yaml]

You can then see the rule in Status->Rules in the Prometheus UI

BryanEllington avatar Jun 09 '21 14:06 BryanEllington

Thanks for help. I am able to add new alert. Can we disable the default existing alert ?

sangram23 avatar Jun 10 '21 15:06 sangram23

What existing alert are you looking to disable?

BryanEllington avatar Jun 11 '21 18:06 BryanEllington

name: KubeControllerManagerDown expr: absent(up{job="kube-controller-manager"} == 1) for: 15m labels: severity: critical annotations: description: KubeControllerManager has disappeared from Prometheus target discovery. runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown summary: Target disappeared from Prometheus target discovery.

sangram23 avatar Jun 14 '21 09:06 sangram23

Fortunately, that rule is the only one in the monitoring/v4m-kubernetes-system-controller-manager Prometheusrule custom resource. All you should need to do is run: kubectl delete prometheusrule -n monitoring v4m-kubernetes-system-controller-manager

BryanEllington avatar Jun 14 '21 15:06 BryanEllington