kube-prometheus
kube-prometheus copied to clipboard
Use the new kind AlertmanagerConfig to configure alertmanager
There is a lot of users asking on how to configure the alertmanager, I think it will help if kube-prometheus generates this object with some empowerment on user side to fill it.
An advantage to use it is that it can automatically load some secrets to integrate with alert routing providers.
- kube-prometheus version: at least 0.7 as it requires prometheus-operator ≥ 0.43.0
Using AlertmanagerConfig requires to add some information on the Alertmanager, therefor, users can't use it without some adaptation in kube-prometheus.
I was wondering about this too.
Could you elaborate on what you would like to see? It's not entirely clear to me what someone who wanted to work on this would try to implement.
After some testing and local POCing, I finally have an answer.
I suggest tree changes (mostly the two first):
- We define an
alertmanagerConfigSelector
in the alertmanager and allow users to specify it (by example in_config+:.alertmanager+:.configSelector
) - We define a default value for the selector (example below)
alertmanagerConfigSelector:
matchLabels:
alertmanagerConfig: $alertmanager.name # or "main" or whatever
- When the operator will allow global
AlertmanagerConfig
we should then be able to migrate currentroute
,receivers
andinhibit_rules
in anAlertmanagerConfig
resource (mainly useful for users that are looking for how to use this resource kind).
AlertmanagerConfig selector can be already included as:
alertmanager+: {
alertmanager+: {
spec+: {
alertmanagerConfigSelector: {
matchLabels: {
alertmanagerConfig: $alertmanager.alertmanager.metadata.name
},
},
},
},
}
In my cluster, I have leveraged the AlertmanagerConfig custom resource https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#alertmanagerconfig, which in my understanding, help us change the currently deployed Alertmanager config files at will.
So, to configure my Sendgrid account to deal with desired email integration (a.k.a. Alertmanager sending e-mail alert notifications), I have created a file named AlertManagerConfigmap.yaml with the following:
apiVersion: monitoring.coreos.com/v1alpha1 kind: AlertmanagerConfig metadata: name: example spec: receivers:
- name: 'email' email_configs:
- to: '[email protected]' from: '[email protected]' smarthost: smtp.sendgrid.net:587 auth_username: 'apikey' auth_password: 'XXXXXXXXX' route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 10s receiver: 'email'
Hoping this configuration will add to the existing one after applying it to the cluster.
Now, when I try to apply it, I got this error:
$ kubectl apply -f monitoring-alertmanager-configmap.yaml
error: error validating "monitoring-alertmanager-configmap.yaml": error validating data: [ValidationError(AlertmanagerConfig.metadata): unknown field "route" in io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta, ValidationError(AlertmanagerConfig.metadata): unknown field "spec" in io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta, ValidationError(AlertmanagerConfig): missing required field "spec" in com.coreos.monitoring.v1alpha1.AlertmanagerConfig]; if you choose to ignore these errors, turn validation off with --validate=false
Someone could guide me in this?
@dnaranjor I've success create the AlertmanagerConfig CRD following the doc(https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/alerting.md#alertmanagerconfig-resource).
This is my yaml file, FYI.
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
labels:
release: kube-prometheus-stack
name: mail-config
spec:
route:
groupBy: ['job']
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receiver: 'mail'
routes:
- match:
alertname: Watchdog
receiver: mail
receivers:
- name: mail
emailConfigs:
- to: <mail addr>
from: <mail addr>
smarthost: smtp.gmail.com:465
authUsername: ntphrf
authPassword:
name: mail-password
key: password
requireTLS: false
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: mail-password
data:
password: xxxjixjijxijixxxxxxxx
@orcahmlee thanks for sharing... My problem currently is that, even that I have configured the receivers section with my email server settings, those are not showing in my AlertManager Config file, so my guess is they are not being enforced at all.
In your scenario, after applying this AlertmanagerConfig CRD you're sharing, the new emailConfigs information added is showed in the AlertManager GUI > Status > Config section?
In your scenario, after applying this AlertmanagerConfig CRD you're sharing, the new emailConfigs information added is showed in the AlertManager GUI > Status > Config section?
Yes, I can see the change after applying.
@orcahmlee thanks for sharing... My problem currently is that, even that I have configured the receivers section with my email server settings, those are not showing in my AlertManager Config file, so my guess is they are not being enforced at all.
@dnaranjor Have you tried check the directives in the yaml file? I realized the directives are different between alertmanager.yml
and AlertmanagerConfig CRD
. e.g.:
-
email_configs
->emailConfigs
-
auth_username
->authUsername
This was also my mistakes when I applied AlertmanagerConfig CRD first time. Hope this info is useful.
I found another very simple but almost perfect workaround that works perfectly - set in value.yaml
the following:
prometheus:
additionalAlertRelabelConfigs:
- source_labels: [namespace] # adds missing namespace label
target_label: namespace
regex: (^$)
replacement: monitoring # should match namespace where alertmanager deployed
which will be rendered into the following:
alerting:
alert_relabel_configs:
- source_labels: [namespace]
separator: ;
regex: (^$)
target_label: namespace
replacement: monitoring
action: replace
can we use multiple receivers in alertmanagerconfig
So I've been trying to use this, but it looks like it adds a namespace matcher by default, which I think makes it impossible to just get all alerts caught. For example trying to get the Watchdog :
---
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: am-config-watchdog
labels:
alertmanagerConfig: am-config
spec:
route:
matchers:
- name: alertname
value: Watchdog
matchType: '='
groupWait: 30s
groupInterval: 5m
repeatInterval: 1h
receiver: 'dms'
continue: false
receivers:
- name: 'dms'
webhookConfigs:
- url: https://...
This seems to resolve in the AM Config as :
- receiver: monitoring/am-config-watchdog/dms
matchers:
- alertname="Watchdog"
- namespace="monitoring"
continue: true
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
This namespace match is automatically added, and since the Watchdog does not have a namespace set it doesn't match and just continues on to the default null
receiver. It's also overriding the continue
to force it to be true, which is a bit frustrating.
I've never used AlertManagerConfig before so I may be missing something.
This is all applied using argocd + kustomize, not sure if there's a way to change the default AM config and change the default watchdog route instead, that could be another way to go but it looks like it's in a secret so that doesn't look trivial.