helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[victoria-metrics-k8s-stack] default Alertmanager configuration

Open b-a-t opened this issue 2 years ago • 5 comments

Hi!

I was trying to find out how Alertmanager configuration is supposed to be set in the victoria-metrics-k8s-stack but seems fail to understand the whole machinery.

So let's assume I have my very own configuration, which I'd like to use. One option is to encode it into base64, place into the secret and refer to that secret in the configSecret: "alertmanager-config".

That seems to work, but maintaining such a configuration becomes a royal pain in the 🍑

Assuming we have nothing to hide, the easiest way should be to define config: map in the values.yaml of the Helm chart.

Unfortunately, it seems not so easy. The resulting configuration looks like a mixture of the default configuration for the victoria-metrics-k8s-stack defaults from values.yaml:

  config:
    global:
      resolve_timeout: 5m
      slack_api_url: "http://slack:30500/"
    templates:
      - "/etc/vm/configs/**/*.tmpl"
    route:
      group_by: ["alertgroup", "job"]
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: "slack-monitoring"
      routes:
        ###################################################
        ## Duplicate code_owner routes to teams
        ## These will send alerts to team channels but continue
        ## processing through the rest of the tree to handled by on-call
        - matchers:
            - code_owner_channel!=""
            - severity=~"info|warning|critical"
          group_by: ["code_owner_channel", "alertgroup", "job"]
          receiver: slack-code-owners
        ###################################################
        ## Standard on-call routes
        - matchers:
            - severity=~"info|warning|critical"
          receiver: slack-monitoring
          continue: true

    inhibit_rules:
      - target_matchers:
          - severity=~"warning|info"
        source_matchers:
          - severity=critical
        equal:
          - cluster
          - namespace
          - alertname
      - target_matchers:
          - severity=info
        source_matchers:
          - severity=warning
        equal:
          - cluster
          - namespace
          - alertname
      - target_matchers:
          - severity=info
        source_matchers:
          - alertname=InfoInhibitor
        equal:
          - cluster
          - namespace

    receivers:
      - name: "slack-monitoring"
        slack_configs:
          - channel: "#channel"
            send_resolved: true
            title: '{{ template "slack.monzo.title" . }}'
            icon_emoji: '{{ template "slack.monzo.icon_emoji" . }}'
            color: '{{ template "slack.monzo.color" . }}'
            text: '{{ template "slack.monzo.text" . }}'
            actions:
              - type: button
                text: "Runbook :green_book:"
                url: "{{ (index .Alerts 0).Annotations.runbook_url }}"
              - type: button
                text: "Query :mag:"
                url: "{{ (index .Alerts 0).GeneratorURL }}"
              - type: button
                text: "Dashboard :grafana:"
                url: "{{ (index .Alerts 0).Annotations.dashboard }}"
              - type: button
                text: "Silence :no_bell:"
                url: '{{ template "__alert_silence_link" . }}'
              - type: button
                text: '{{ template "slack.monzo.link_button_text" . }}'
                url: "{{ .CommonAnnotations.link_url }}"
      - name: slack-code-owners
        slack_configs:
          - channel: "#{{ .CommonLabels.code_owner_channel }}"
            send_resolved: true
            title: '{{ template "slack.monzo.title" . }}'
            icon_emoji: '{{ template "slack.monzo.icon_emoji" . }}'
            color: '{{ template "slack.monzo.color" . }}'
            text: '{{ template "slack.monzo.text" . }}'
            actions:
              - type: button
                text: "Runbook :green_book:"
                url: "{{ (index .Alerts 0).Annotations.runbook }}"
              - type: button
                text: "Query :mag:"
                url: "{{ (index .Alerts 0).GeneratorURL }}"
              - type: button
                text: "Dashboard :grafana:"
                url: "{{ (index .Alerts 0).Annotations.dashboard }}"
              - type: button
                text: "Silence :no_bell:"
                url: '{{ template "__alert_silence_link" . }}'
              - type: button
                text: '{{ template "slack.monzo.link_button_text" . }}'
                url: "{{ .CommonAnnotations.link_url }}"

The supplied override yaml file for the Helm chart:

    config:
      global: {}
      templates:
        - '/etc/vm/configs/*.tmpl'
      route:
        group_wait: 15s
        group_interval: 5m
        receiver: empty
        repeat_interval: 4h
      receivers:
        - name: emplty

And the resulting alertmanager.yaml, stored in secret vm-stack-alertmanager is:

global:
  resolve_timeout: 5m
  slack_api_url: http://slack:30500/
inhibit_rules:
- equal:
  - cluster
  - namespace
  - alertname
  source_matchers:
  - severity=critical
  target_matchers:
  - severity=~"warning|info"
- equal:
  - cluster
  - namespace
  - alertname
  source_matchers:
  - severity=warning
  target_matchers:
  - severity=info
- equal:
  - cluster
  - namespace
  source_matchers:
  - alertname=InfoInhibitor
  target_matchers:
  - severity=info
receivers:
- name: emplty
route:
  group_by:
  - alertgroup
  - job
  group_interval: 5m
  group_wait: 15s
  receiver: empty
  repeat_interval: 4h
  routes:
  - group_by:
    - code_owner_channel
    - alertgroup
    - job
    matchers:
    - code_owner_channel!=""
    - severity=~"info|warning|critical"
    receiver: slack-code-owners
  - continue: true
    matchers:
    - severity=~"info|warning|critical"
    receiver: slack-monitoring
templates:
- /etc/vm/configs/*.tmpl

I can't find the pattern of how those two files get merged - i.e. routes and inhibit_rules seems inherited from the defaults, while receivers got completely overridden by the override file.

My main question is why supply such an extensive configuration for the Alertmanager in the defaults values.yaml of the Helm chart in the first place, taking into account that there is no easy way to override those values and they(seem) will always interfere with the user-supplied configuration.

b-a-t avatar Feb 22 '23 02:02 b-a-t

A bit more experimenting shows, that if you provide override sections for top-level keys in the config: map - the vales from the override values file will be used for inhibit_rules, routes, and receivers. One notable exception is the global section - that couldn't be overridden and always contains that:

global:
  resolve_timeout: 5m
  slack_api_url: http://slack:30500/

block.

b-a-t avatar Feb 22 '23 02:02 b-a-t

they(seem) will always interfere with the user-supplied configuration.

For now you can use extra configSecret to keep your configuration untouched https://github.com/VictoriaMetrics/helm-charts/blob/659a1d6d4eb35f5dbdd8c539c376cdbcbc60b1c9/charts/victoria-metrics-k8s-stack/values.yaml#L336-L337

Haleygo avatar Jul 19 '23 04:07 Haleygo

This is indeed unpractical. Since the whole configuration is defined as default in the chart's values.yaml each key has to be overridden explicitly. Which is even more of royal pain in the a..". It would be good to comment out the configuration, so that users have a template from which they can tailor their configs.

grawert avatar Jul 20 '23 11:07 grawert

+1 Yes, I agree, the ability to override the default configuration is very necessary, because when adding your own routes there is some kind of mix of the standard configuration and mine

wisdomdevil avatar Sep 30 '23 14:09 wisdomdevil

It must be fixed at 0.19.2. release.

By default, chart is shipped with empty configuration and blackhole as destination router.

All configuration must defined at own values file or configured with AlertmanagerConfig CRD objects.

f41gh7 avatar Feb 29 '24 22:02 f41gh7

I created a simple gist with recommended project structure for alertmanager and small tips based on our helm usage experience. https://gist.github.com/f41gh7/f375d9dcca68838ec69621b1955d3768

f41gh7 avatar Jul 29 '24 12:07 f41gh7

closing this issue, as default alertmanager configuration was removed in version 0.19.2

AndrewChubatiuk avatar Aug 23 '24 09:08 AndrewChubatiuk