cloud-platform icon indicating copy to clipboard operation
cloud-platform copied to clipboard

silence alerts during node recycle

Open razvan-moj opened this issue 2 years ago • 0 comments

Background

we get an annoying and useless alert every time a node is recycled, which right now is 2-3 times at early hours in the day.

Proposed user journey

Alertmanager has a curl-able API and a CLI (https://github.com/prometheus/alertmanager#amtool) that can make this step relatively simple

Our problem is mostly that the recycler runs outside the cluster so cannot reach the alertmanager SVC endpoint

Approach

Before https://github.com/ministryofjustice/cloud-platform-terraform-concourse/blob/main/pipelines/manager/main/maintenance.yaml#L92, we'd need to silence specific alerts and turn them back on afterwards.

curl/amtool will need a way to either tunnel (kubectl exec) or authenticate (amtool supports digest and OIDC)

Definition of done

  • [ ] Concourse pipeline updated
  • [ ] alerts do not trigger anymore in the morning

razvan-moj avatar Oct 11 '22 08:10 razvan-moj