support-bundle-kit
support-bundle-kit copied to clipboard
feat: add prometheus alerts in support bundle
Related Issue
https://github.com/harvester/harvester/issues/4993
Solution
For the first version of this feature, focus on fetching current alert.
I think If we fetch too many alerts here, it might be a problem for us debugging because it's too much and not easy to be queried. So, I just fetch current alert and format it.
Test Plan
Case 1. Generate support bundle without enabling rancher-monitoring, it should succeed as well, excluding prometheus-alerts.json.
Case 2. Generate support bundle with enable rancher-monitoring, there should be a file called prometheus-alerts.json in first layer of directory.
Result
Sample, it only shows pending and firing state alert
[
{
"activeAt": "2024-01-23T06:59:00Z",
"Annotations": {
"description": "100% of the rancher/rancher targets in cattle-system namespace are down.",
"runbook_url": "https://runbooks.prometheus-operator.dev/runbooks/general/targetdown",
"summary": "One or more targets are unreachable."
},
"Labels": {
"alertname": "TargetDown",
"job": "rancher",
"namespace": "cattle-system",
"service": "rancher",
"severity": "warning"
},
"State": "firing",
"Value": "1e+02"
},
{
"activeAt": "2024-01-24T07:37:00.510363907Z",
"Annotations": {
"description": "This is an alert meant to ensure that the entire alerting pipeline is functional.\nThis alert is always firing, therefore it should always be firing in Alertmanager\nand always fire against a receiver. There are integrations with various notification\nmechanisms that send a notification when this alert is not firing. For example the\n\"DeadMansSnitch\" integration in PagerDuty.\n",
"runbook_url": "https://runbooks.prometheus-operator.dev/runbooks/general/watchdog",
"summary": "An alert that should always be firing to certify that Alertmanager is working properly."
},
"Labels": {
"alertname": "Watchdog",
"severity": "none"
},
"State": "firing",
"Value": "1e+00"
},
// ignore others for reading...
]