helm-charts
helm-charts copied to clipboard
[victoria-metrics-k8s-stack] default Alertmanager configuration
Hi!
I was trying to find out how Alertmanager configuration is supposed to be set in the victoria-metrics-k8s-stack but seems fail to understand the whole machinery.
So let's assume I have my very own configuration, which I'd like to use. One option is to encode it into base64, place into the secret and refer to that secret in the configSecret: "alertmanager-config".
That seems to work, but maintaining such a configuration becomes a royal pain in the 🍑
Assuming we have nothing to hide, the easiest way should be to define config: map in the values.yaml of the Helm chart.
Unfortunately, it seems not so easy. The resulting configuration looks like a mixture of the default configuration for the victoria-metrics-k8s-stack defaults from values.yaml:
config:
global:
resolve_timeout: 5m
slack_api_url: "http://slack:30500/"
templates:
- "/etc/vm/configs/**/*.tmpl"
route:
group_by: ["alertgroup", "job"]
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: "slack-monitoring"
routes:
###################################################
## Duplicate code_owner routes to teams
## These will send alerts to team channels but continue
## processing through the rest of the tree to handled by on-call
- matchers:
- code_owner_channel!=""
- severity=~"info|warning|critical"
group_by: ["code_owner_channel", "alertgroup", "job"]
receiver: slack-code-owners
###################################################
## Standard on-call routes
- matchers:
- severity=~"info|warning|critical"
receiver: slack-monitoring
continue: true
inhibit_rules:
- target_matchers:
- severity=~"warning|info"
source_matchers:
- severity=critical
equal:
- cluster
- namespace
- alertname
- target_matchers:
- severity=info
source_matchers:
- severity=warning
equal:
- cluster
- namespace
- alertname
- target_matchers:
- severity=info
source_matchers:
- alertname=InfoInhibitor
equal:
- cluster
- namespace
receivers:
- name: "slack-monitoring"
slack_configs:
- channel: "#channel"
send_resolved: true
title: '{{ template "slack.monzo.title" . }}'
icon_emoji: '{{ template "slack.monzo.icon_emoji" . }}'
color: '{{ template "slack.monzo.color" . }}'
text: '{{ template "slack.monzo.text" . }}'
actions:
- type: button
text: "Runbook :green_book:"
url: "{{ (index .Alerts 0).Annotations.runbook_url }}"
- type: button
text: "Query :mag:"
url: "{{ (index .Alerts 0).GeneratorURL }}"
- type: button
text: "Dashboard :grafana:"
url: "{{ (index .Alerts 0).Annotations.dashboard }}"
- type: button
text: "Silence :no_bell:"
url: '{{ template "__alert_silence_link" . }}'
- type: button
text: '{{ template "slack.monzo.link_button_text" . }}'
url: "{{ .CommonAnnotations.link_url }}"
- name: slack-code-owners
slack_configs:
- channel: "#{{ .CommonLabels.code_owner_channel }}"
send_resolved: true
title: '{{ template "slack.monzo.title" . }}'
icon_emoji: '{{ template "slack.monzo.icon_emoji" . }}'
color: '{{ template "slack.monzo.color" . }}'
text: '{{ template "slack.monzo.text" . }}'
actions:
- type: button
text: "Runbook :green_book:"
url: "{{ (index .Alerts 0).Annotations.runbook }}"
- type: button
text: "Query :mag:"
url: "{{ (index .Alerts 0).GeneratorURL }}"
- type: button
text: "Dashboard :grafana:"
url: "{{ (index .Alerts 0).Annotations.dashboard }}"
- type: button
text: "Silence :no_bell:"
url: '{{ template "__alert_silence_link" . }}'
- type: button
text: '{{ template "slack.monzo.link_button_text" . }}'
url: "{{ .CommonAnnotations.link_url }}"
The supplied override yaml file for the Helm chart:
config:
global: {}
templates:
- '/etc/vm/configs/*.tmpl'
route:
group_wait: 15s
group_interval: 5m
receiver: empty
repeat_interval: 4h
receivers:
- name: emplty
And the resulting alertmanager.yaml, stored in secret vm-stack-alertmanager is:
global:
resolve_timeout: 5m
slack_api_url: http://slack:30500/
inhibit_rules:
- equal:
- cluster
- namespace
- alertname
source_matchers:
- severity=critical
target_matchers:
- severity=~"warning|info"
- equal:
- cluster
- namespace
- alertname
source_matchers:
- severity=warning
target_matchers:
- severity=info
- equal:
- cluster
- namespace
source_matchers:
- alertname=InfoInhibitor
target_matchers:
- severity=info
receivers:
- name: emplty
route:
group_by:
- alertgroup
- job
group_interval: 5m
group_wait: 15s
receiver: empty
repeat_interval: 4h
routes:
- group_by:
- code_owner_channel
- alertgroup
- job
matchers:
- code_owner_channel!=""
- severity=~"info|warning|critical"
receiver: slack-code-owners
- continue: true
matchers:
- severity=~"info|warning|critical"
receiver: slack-monitoring
templates:
- /etc/vm/configs/*.tmpl
I can't find the pattern of how those two files get merged - i.e. routes and inhibit_rules seems inherited from the defaults, while receivers got completely overridden by the override file.
My main question is why supply such an extensive configuration for the Alertmanager in the defaults values.yaml of the Helm chart in the first place, taking into account that there is no easy way to override those values and they(seem) will always interfere with the user-supplied configuration.
A bit more experimenting shows, that if you provide override sections for top-level keys in the config: map - the vales from the override values file will be used for inhibit_rules, routes, and receivers. One notable exception is the global section - that couldn't be overridden and always contains that:
global:
resolve_timeout: 5m
slack_api_url: http://slack:30500/
block.
they(seem) will always interfere with the user-supplied configuration.
For now you can use extra configSecret to keep your configuration untouched
https://github.com/VictoriaMetrics/helm-charts/blob/659a1d6d4eb35f5dbdd8c539c376cdbcbc60b1c9/charts/victoria-metrics-k8s-stack/values.yaml#L336-L337
This is indeed unpractical. Since the whole configuration is defined as default in the chart's values.yaml each key has to be overridden explicitly. Which is even more of royal pain in the a..". It would be good to comment out the configuration, so that users have a template from which they can tailor their configs.
+1 Yes, I agree, the ability to override the default configuration is very necessary, because when adding your own routes there is some kind of mix of the standard configuration and mine
It must be fixed at 0.19.2. release.
By default, chart is shipped with empty configuration and blackhole as destination router.
All configuration must defined at own values file or configured with AlertmanagerConfig CRD objects.
I created a simple gist with recommended project structure for alertmanager and small tips based on our helm usage experience. https://gist.github.com/f41gh7/f375d9dcca68838ec69621b1955d3768
closing this issue, as default alertmanager configuration was removed in version 0.19.2