awesome-prometheus-alerts
awesome-prometheus-alerts copied to clipboard
Refacto: write more accurate descriptions for faster troubleshooting
Example:
From:
- name: Prometheus rule evaluation failures
description: 'Prometheus encountered {{ $value }} rule evaluation failures.'
query: 'increase(prometheus_rule_evaluation_failures_total[3m]) > 0'
To:
- name: Prometheus rule evaluation failures
description: 'Prometheus encountered {{ $value }} rule evaluation failures, leading to potentially ignored alerts.'
query: 'increase(prometheus_rule_evaluation_failures_total[3m]) > 0'
An effect
field would enable us to improve alert template.
I would welcome an effect
field. I've solved this locally by including an effect
and it's very helpful to reduce the size of the description
when only that's required (on a status board) but include specific resolutions in slack messages for example.
e.g.
We can probably find a balance between:
- Description/cause
- Effects
- Resolution guidelines
Gitlab infrastructure team adds a reference to a troubleshooting markdown.
See:
- https://gitlab.com/gitlab-com/runbooks/blob/master/rules/prometheus-metamons.yml
- https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/monitoring/node_memory_alerts.md
Effects
This is what alert name should be about as this is the first thing operator sees when receives alert. Additionally, this could be enhanced by summary
annotation field.
Description/cause
In prometheus community this is usually done with either message
field (for example in kubernetes-monitoring/kubernetes-mixin project or with description
field (example in node-mixin project).
Resolution guidelines
This is basically a runbook/SOP. For example kubernetes-mixin project includes those as runbook_url
as a field in alert annotations.
Such runbooks are located in one file, and links are made to specific anchors.
This field is usually the most problematic one, as creating a runbook needs a deep knowledge of the system itself.
Essentially those are problems already solved by the prometheus community.