arcade
arcade copied to clipboard
Standard template for grafana alert documentation
Per suggestion from @ChadNedzlek, we should have a standard format for documentation of our grafana alerts.
"I feel like we should have a standard format for articles like this. Maybe this should be in a section called "Remediation" or something. Every alert has three primary steps.
- Gather information that can't be gathered after the fact
- Restore services as soon as possible / notify users
- Gather historical information to RCA
It would be nice if those 3 things were in the same format for every alert, so that FR could dive right in without having to read everything. All the "additional information" can go later (it's not as important as what steps to take immediately)."