amazon-managed-service-for-prometheus-roadmap icon indicating copy to clipboard operation
amazon-managed-service-for-prometheus-roadmap copied to clipboard

Use managed Prometheus with in-cluster Alertmanager

Open razvan-moj opened this issue 2 years ago • 7 comments

Prometheus (as deployed by the commonly used operator chart) is difficult to maintain and a resource hog; because of that AMP is very attractive. Alertmanager though works fine for our purposes, and we have

$ kubectl get prometheusrule -A -ojson | jq -r '.items[].spec.groups[].rules[].alert' | wc -l
    4449

alerts defined, by teams which can each access only one namespace (https://github.com/ministryofjustice/cloud-platform-environments/search?q=prometheusrule), so no shared visibility. Alerts go directly to eg Slack, each team having control of its channel and hooks.

It would be ideal for us to use the managed Prometheus but keep alert definitions and Alertmanager as they are right now (presumably, AMP would need a configuration option to reach the cluster's AM).

razvan-moj avatar Mar 01 '22 18:03 razvan-moj

Hey @razvan-moj,

Thank you for posting this issue! I would love to better understand this use case, and had a few follow up questions for you.

  1. Have you seen our slack integration via SNS blog post? It walks through how to integrate AMP's alertmanager with Slack such that you have a very similar configuration paradigm to the native slack receiver. The blog can be found here. Would this solve your problem and allow you to use the AMP Alert Manager? If not, we'd love to learn a bit more as to why!

  2. We are looking to eventually add support for the native slack receiver. If AMP's alert manager had the native slack receiver support, would that enable you to use the AMP Alert Manager? If not, would love to learn a little bit more about what's driving the use case!

Thank you for submitting this feature proposal! Excited to learn a bit more about the use case from you.

ampabhi-aws avatar Mar 15 '22 02:03 ampabhi-aws

Hey @ampabhi-aws !

  1. Have you seen our slack integration via SNS blog post? It walks through how to integrate AMP's alertmanager with Slack such that you have a very similar configuration paradigm to the native slack receiver. The blog can be found here. Would this solve your problem and allow you to use the AMP Alert Manager? If not, we'd love to learn a bit more as to why!

We have many (4000 odd as listed above) rules and alerts defined by users in their namespaces; those rules are picked up by Alertmanager once the .yaml is applied. Users do not have access to the AWS API. To change their setup to AMP it seems we need to need to change their workflow from kubectl apply to terraform apply and figure out a way to validate edits so they don't eg overwrite each other, something done now by the namespace isolation and a validation webhook. This work would not be needed if we could just configure the AWS Prometheus to read prometheusrules from the namespaces and hook with the in-cluster Alertmanager.

  1. We are looking to eventually add support for the native slack receiver. If AMP's alert manager had the native slack receiver support, would that enable you to use the AMP Alert Manager? If not, would love to learn a little bit more about what's driving the use case!

I don't think this is a problem for us, if we decide to use the AWS API to interact with Prometheus, we can also allow SNS and our users are familiar with how it works.

razvan-moj avatar Mar 15 '22 06:03 razvan-moj

Thanks for the clarification @razvan-moj!

If we provided you a way to use CRDs to configure the rules directly via the Kubernetes APIs, and similarly configure your AMP alert manager via those CRDs, would that alleviate the need for an in-cluster Alertmanager? Trying to get a better sense of if the problem is (1) the AMP Alert manager is inconvenient to use, or (2) Doesn't fit the use case entirely.

ampabhi-aws avatar Mar 28 '22 04:03 ampabhi-aws

a way to use CRDs to configure the rules directly via the Kubernetes APIs

A definite +1 for this!

razvan-moj avatar Mar 28 '22 11:03 razvan-moj

https://aws.amazon.com/blogs/mt/introducing-the-ack-controller-for-amazon-managed-service-for-prometheus/

jeromeinsf avatar Sep 26 '22 17:09 jeromeinsf

@ampabhi-aws what are the next steps here?

jeromeinsf avatar Apr 26 '23 13:04 jeromeinsf

Unfortunately, even we had to drop our plans to use AMP due to the same reason of not being able to use the in cluster alertmanager . Our use case is, we use Karma for our on-call where all the alerts are displayed and in Karma dashboard it is easy to view, manage, silence the alerts compared to any-other dashboards. AMP alertmanager cannot be connected with Karma and Managed Grafana is not a suitable option for us.

dili-pk avatar Aug 02 '24 05:08 dili-pk