cloud-platform
cloud-platform copied to clipboard
silence alerts during node recycle
Background
we get an annoying and useless alert every time a node is recycled, which right now is 2-3 times at early hours in the day.
Proposed user journey
Alertmanager has a curl-able API and a CLI (https://github.com/prometheus/alertmanager#amtool) that can make this step relatively simple
Our problem is mostly that the recycler runs outside the cluster so cannot reach the alertmanager SVC endpoint
Approach
Before https://github.com/ministryofjustice/cloud-platform-terraform-concourse/blob/main/pipelines/manager/main/maintenance.yaml#L92, we'd need to silence specific alerts and turn them back on afterwards.
curl/amtool will need a way to either tunnel (kubectl exec
) or authenticate (amtool supports digest and OIDC)
Definition of done
- [ ] Concourse pipeline updated
- [ ] alerts do not trigger anymore in the morning