viya4-monitoring-kubernetes
viya4-monitoring-kubernetes copied to clipboard
How can i disable and create new monitoring rule and alert
I am trying to configure custom alert such “Two Cas pods should not run on same worker node”. Using below step to complete the same but unable to configure new alert rule :
1)- Created new alter rule :
additionalPrometheusRules: ----- ####i have tried with to add and remove this line
- name: CustomeRules
groups:
- name: CAS-Alert-Rule
rules:
- alert: CASPodsNode expr: topk(1,sort_desc((count (kube_pod_info{created_by_kind="CASDeployment"}) by (node)))) == 1 for: 5m labels: severity: critical annotations: summary: "More than One CAS Node Are running on Same Node: (instance {{ $labels.node }})" description: "Two CAS pods got Assgined to Same node. Could be cause Perfomeance Issue"
- name: CAS-Alert-Rule
rules:
2)- Add line in rules in user-values-prom-operator.yaml pormetheus section
3)- Run monitoring/bin/deploy_monitoring_cluster.sh Script to apply the changes .
Unfortunately, not working for me. Would be helpful if you could provide any assistance in this.
A better approach may be to create a standalone PrometheusRule yaml file.
I didn't configure my environment to actually trigger the alert, but I got the alert defined with this yaml: `apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: cas-rules spec: groups:
- name: cas-rules-group
rules:
- alert: multiple-cas-workers-on-node expr: count(kube_pod_info{created_by_kind="CASDeployment"}) by (node) > 1 for: 5m labels: severity: critical annotations: summary: "More than one CAS pod is running on the same node: (instance {{ $labels.node }})" message: "More than one CAS pod is assgined to a single node. This could cause performance issues."`
All I did was kubectl apply -n monitoring -f [path/to/theFile.yaml]
You can then see the rule in Status->Rules in the Prometheus UI
Thanks for help. I am able to add new alert. Can we disable the default existing alert ?
What existing alert are you looking to disable?
name: KubeControllerManagerDown expr: absent(up{job="kube-controller-manager"} == 1) for: 15m labels: severity: critical annotations: description: KubeControllerManager has disappeared from Prometheus target discovery. runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown summary: Target disappeared from Prometheus target discovery.
Fortunately, that rule is the only one in the monitoring/v4m-kubernetes-system-controller-manager Prometheusrule custom resource. All you should need to do is run:
kubectl delete prometheusrule -n monitoring v4m-kubernetes-system-controller-manager