kured icon indicating copy to clipboard operation
kured copied to clipboard

Silencing alerts in alertmanager should be ignored in kured

Open codestalkerr opened this issue 4 years ago • 18 comments

It would be nice to have this set up where we can silence some alerts in alert manager and then those alerts should be ignored in Kured. It would be instant and help to handle random alerts also don't have to wait for the code to be deployed for it to reboot.

codestalkerr avatar Jan 18 '22 09:01 codestalkerr

Main challenge is, that Prometheus is not aware of silences which are made in Alertmanager. To make this work we would also have to integrate Alertmanager in kured for checks.

ckotzbauer avatar Jan 24 '22 09:01 ckotzbauer

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

github-actions[bot] avatar Mar 26 '22 02:03 github-actions[bot]

re-opening this one - it would be helpful

justinrush avatar Aug 12 '22 21:08 justinrush

@codestalkerr @justinrush Can you give some more information about what would be needed here and how this should behave? I think we need to integrate the Alertmanager-API (https://github.com/prometheus/alertmanager/blob/main/api/v2/openapi.yaml)

ckotzbauer avatar Aug 13 '22 07:08 ckotzbauer

thinking through this more, I think we want something more like this: https://github.com/weaveworks/kured/issues/385, but more generic. Ideally we can provide an arbitrary promQL query and if it has data, then it means hold off on the reboot - if its empty, it means good to go.

I can create a new issue for this if it seems like something that would be acceptable to add.

justinrush avatar Aug 15 '22 13:08 justinrush

Okay. Yes, please create a new issue for this :+1:

ckotzbauer avatar Aug 15 '22 13:08 ckotzbauer

@ckotzbauer My thinking behind this was integrating with Alert manager coz there are times where we silence few alerts in alert manager for reasons and if Kured could also ignore those at the same time then it would have been smooth but now we create a PR to add it, so its ignored and then again to remove it when we remove silence based on situations. By using with Alertmanager silencing it would be pretty quick and no need to edit stuff in Kured separately and maintain.

On side note, do we have any filter to add specific alert to block on (opposite of ignoring alert filter)? Asking this coz we have many alerts to ignore and would be nice to just block on the ones we want :)

codestalkerr avatar Aug 15 '22 13:08 codestalkerr

@codestalkerr That pretty much sounds like a negative-lookahead of regexp. Would that be an option? Golag doesn't support them, but that would be a solvable problem.

ckotzbauer avatar Aug 15 '22 16:08 ckotzbauer

@justinrush Would this also solve your use-case?

ckotzbauer avatar Aug 15 '22 16:08 ckotzbauer

Maybe? but we don't always silence in alert manager - sometimes we'll just modify the label that routes the alert to dev/null rather than a person. But i guess if we can get the label out of the alert in alertmanager and then negative regex on it, that would work?

justinrush avatar Aug 15 '22 16:08 justinrush

I see scenarios are different here and getting the label out and negative regex could work but I think it will again come down to modifying the yaml file and committing changes which I was trying to avoid. We have git ops approach and if we modify manually then the next deploys will override and maintaining that will be crazy. But feels like its a specific scenario for me maybe?

codestalkerr avatar Aug 26 '22 07:08 codestalkerr

Why it is not possible to consolidate the ways to remove/mark alerts? Are they too different to catch them with one regex which has not to be changed every time?

ckotzbauer avatar Aug 27 '22 09:08 ckotzbauer

Yeah so we have many different alerts and we have put that in one regex which is a huge one liner separated by or. So let's say we have some temporary issue which we expect it to stay for few hours or a day/two then we need to update that list right also we silence in alertmanager. Good thing about alert manager is that we can temporarily silence in the UI without doing any code changes and then we commit removing or adding the alert to apply to kured.

codestalkerr avatar Aug 29 '22 07:08 codestalkerr

Hi. I can have a look. One question though: in my understanding, kured would need to send requests to the alert manager, correct? (or is prometheus aware of any silencers ???)

atighineanu avatar Dec 14 '23 05:12 atighineanu

Hi @atighineanu, thanks for your interest. Kured would need to query the Alertmanager-API https://github.com/prometheus/alertmanager/blob/main/api/v2/openapi.yaml to get silences, Prometheus is not aware of them. However, I think it might not be the best idea to use the prometheus/alertmanager project as go-module here, as we would reference all alertmanager dependencies as indirects then. So maybe just do a HTTP-Call or there's another slim Alertmanager Go-Client out there.

ckotzbauer avatar Dec 14 '23 12:12 ckotzbauer

I've created a draft, but I need more input from you regarding kured itself. Is it okay to create several more flags? See #873 and the comment there.

atighineanu avatar Jan 05 '24 07:01 atighineanu

@ckotzbauer, @dholbach any input?

atighineanu avatar Jan 09 '24 15:01 atighineanu

I'll have a look in a few days or next week @atighineanu

ckotzbauer avatar Jan 10 '24 15:01 ckotzbauer