mimir icon indicating copy to clipboard operation
mimir copied to clipboard

Alertmanager fallback configuration doesn't support templates

Open Ferrany1 opened this issue 2 years ago • 7 comments

Describe the bug

Alertmanager can't parse definition from custom template, resulting into empty telegram message error on send try

To Reproduce

Steps to reproduce the behavior:

docker-compose.yaml

  monitoring_host_mimir-1:
    container_name: monitoring_host_mimir-1
    image: grafana/mimir:2.3.1
    user: "0"
    command: [ "-config.file=/etc/mimir/mimir.yml" ]
    restart: unless-stopped
    logging: *default-logging
    volumes:
      - ../backup/mimir-1:/mimir
      - ../config/mimir/mimir.yml:/etc/mimir/mimir.yml:ro
      - ../config/mimir/alertmanager/alerts/:/mimir/fs_rules/anonymous/:ro
      - ../config/mimir/alertmanager/alertmanager.yml:/etc/mimir/alertmanager.yml:ro
    expose:
      - 8080

alertmanager.yml

templates:
- '/etc/alertmanager/templates/telegram.tmpl'

receivers:
  - name: "telegram"
    telegram_configs:
      - bot_token:
        chat_id:
        api_url: https://api.telegram.org
        message:'{{ template "telegram.message" . }}'

alertmanager.yml

receivers:
  - name: "telegram"
    telegram_configs:
      - bot_token:
        chat_id:
        api_url: https://api.telegram.org
        message: '{{ template "telegram.message" . }}'
        parse_mode: HTML

telegram.tmpl

{{ define "__alertmanager" }}Alertmanager{{ end }}
{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver | urlquery }}{{ end }}

{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
{{ define "__description" }}{{ end }}

{{ define "telegram.message" }}
{{ if gt (len .Alerts.Firing) 0 }}
Alerts Firing:
{{ template "__text_alert_list" .Alerts.Firing }}
{{ end }}
{{ if gt (len .Alerts.Resolved) 0 }}
Alerts Resolved:
{{ template "__text_alert_list" .Alerts.Resolved }}
{{ end }}
{{ end }}

Expected behavior

Alert sent to telegram message

Environment

  • Infrastructure: docker, centos8
  • Deployment tool: docker-compose

Additional Context

I've tested template via https://github.com/prometheus/alertmanager/blob/main/template/template_test.go with '{{ template "telegram.message" . }}' and everything works correctly, I haven't tried to deploy standalone Prometheus Alertmanager, but it seems it may work fine.

Currently to make everything work I've putted template message part itself fully into alertmanager.yml message field without ref and works ok.

Ferrany1 avatar Oct 09 '22 23:10 Ferrany1

In your example the telegram.tmpl isn't mounted on the container. Could this be the problem?

dimitarvdimitrov avatar Oct 11 '22 09:10 dimitarvdimitrov

No, sorry I've copied latest version without mount, however I was testing with proper mount + I've checked file actually mounted at path

Ferrany1 avatar Oct 11 '22 09:10 Ferrany1

Are you configuring alertmanager.yml uploading it with mimirtool alertmanager load command (along with the template) or are you configuring it as fallback Alertmanager configuration?

pracucci avatar Oct 11 '22 18:10 pracucci

As a fallback

mimir.yml part

alertmanager:
  data_dir: /mimir/alertmanager
  fallback_config_file: /etc/mimir/alertmanager.yml
  external_url: http://127.0.0.1:8080/alertmanager

Ferrany1 avatar Oct 11 '22 19:10 Ferrany1

You raised a very good point. The alertmanager fallback configuration currently doesn't support templates. This is something we should fix.

As a workaround, could upload the alertmanager yaml config + templates using mimirtool alertmanager load instead (doc) for the specific tenant?

pracucci avatar Oct 11 '22 19:10 pracucci

I've managed it by putting message full message template (without def) into alertmanager.yml If you could point me onto loader I'me have a look into it and maybe make some pr with fixes, obviously if its needed and team not currently working on it

Ferrany1 avatar Oct 11 '22 19:10 Ferrany1

If you could point me onto loader I'me have a look into it and maybe make some pr with fixes, obviously if its needed and team not currently working on it

None is working on it and we would love your help! ❤️

The fallback config is loaded from here: https://github.com/grafana/mimir/blob/main/pkg/alertmanager/multitenant.go#L840

The alertmanagerFromFallbackConfig() is a bit tricky. The way it works is creating an empty config definition and store it in the backend storage: https://github.com/grafana/mimir/blob/3c8fabdbece41f894a49c7024cdd5982fa26924d/pkg/alertmanager/multitenant.go#L864-L868

Then we call setConfig() which loads the fallback config if the config is empty (was forcefully set to empty in alertmanagerFromFallbackConfig): https://github.com/grafana/mimir/blob/3c8fabdbece41f894a49c7024cdd5982fa26924d/pkg/alertmanager/multitenant.go#L675-L684

pracucci avatar Oct 11 '22 19:10 pracucci

Is someone working on it? Or we can think of contributing to it?

paulojmdias avatar Mar 30 '23 14:03 paulojmdias

Is someone working on it? Or we can think of contributing to it?

None is working on it. You're welcome to contribute! ❤️

pracucci avatar Mar 30 '23 16:03 pracucci

Definetly this is something I would love to have <3


As an idea, what about simply creating some little watcher for k8s to detect changes on a configmap with the configs and then use mimirtools to upload it from time to time?

achetronic avatar Jun 07 '23 03:06 achetronic

Running into the same issue. Chose to try to use the fallback config as there's no way to configure alertmanager configs without mimirtool (don't want a manual step to configuring mimir). Is there any other solution currently? Anyone ever work on this? I'm not capable of doing it myself.

eric-engberg avatar Oct 27 '23 00:10 eric-engberg

Running into the same issue. Chose to try to use the fallback config as there's no way to configure alertmanager configs without mimirtool (don't want a manual step to configuring mimir). Is there any other solution currently? Anyone ever work on this? I'm not capable of doing it myself.

As I said previously, if you run mimirtool in a cronjob, with your config and your templates loaded into it, you can upload your config, lets say, in 5m lapses periodically and it's automated. We use it that way and it's working well 😊

achetronic avatar Oct 27 '23 05:10 achetronic

@pracucci Seems like I've managed to fix it, but I've no idea how to write tests to check it, since you're not testing alertmanager in mimir, and the only option for me is to put alerts for notifyer and they are executed directly to receivers.

I've tested it locally on such configs:

mimir.yaml:

target: all,alertmanager,ruler

multitenancy_enabled: false
no_auth_tenant: "anonymous"

blocks_storage:
  backend: filesystem
  bucket_store:
    sync_dir: ./temp/tsdb-sync
  filesystem:
    dir: ./temp/data/tsdb
  tsdb:
    dir: ./temp/tsdb

compactor:
  data_dir: ./temp/compactor
  sharding_ring:
    kvstore:
      store: memberlist

distributor:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: memberlist

ingester:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: memberlist
    replication_factor: 1

ruler:
  alertmanager_url: http://127.0.0.1:8080/alertmanager

ruler_storage:
  backend: local
  local:
    directory: ./temp/fs_rules

alertmanager:
  data_dir: ./temp/alertmanager
  fallback_config_file: ./alertmanager.yaml
  external_url: http://127.0.0.1:8080/alertmanager

alertmanager_storage:
  backend: filesystem
  filesystem:
    dir: ./temp/alerts

limits:
  max_label_names_per_series: 100

server:
  log_level: warn
  http_listen_port: 8080

store_gateway:
  sharding_ring:
    replication_factor: 1

alertmanager.yaml:

route:
  repeat_interval: 30s
  group_interval: 60s
  group_wait: 30s
  receiver: 'telegram'

templates:
  - './telegram.tmpl'

receivers:
  - name: "telegram"
    telegram_configs:
      - bot_token: ''
        chat_id: ''
        api_url: https://api.telegram.org
        message: '{{ template "telegram.message" . }}'

telegram.tmpl

{{ define "telegram.message" }}
test
{{ end }}

Ferrany1 avatar Oct 27 '23 10:10 Ferrany1

Sorry for taking it too long, I tottaly forgot about this issue for year

Ferrany1 avatar Oct 27 '23 11:10 Ferrany1

@pracucci Can you help me with pr?

Ferrany1 avatar Nov 01 '23 12:11 Ferrany1

@pracucci before I take a look at #6495, is it possible Mimir doesn't support templates in the fallback configuration on purpose to avoid a situation where the fallback configuration fails for the same reason as the main configuration (i.e. a shared, bad template)?

grobinson-grafana avatar Nov 06 '23 16:11 grobinson-grafana

before I take a look at https://github.com/grafana/mimir/pull/6495, is it possible Mimir doesn't support templates in the fallback configuration on purpose to avoid a situation where the fallback configuration fails for the same reason as the main configuration (i.e. a shared, bad template)?

We should ask @gotjosh and @stevesg cause they know better. I don't remember any discussion where we decided to not do it on purpose. I've more the feeling this was an oversight from us.

However, I think we should ideally validate the fallback config and not start the alertmanager if some required templates are missing.

pracucci avatar Nov 16 '23 11:11 pracucci

However, I think we should ideally validate the fallback config and not start the alertmanager if some required templates are missing.

The main issue here is that the template in the fallback configuration can fail at runtime. Not because it's absent on disk, but because of a syntax error in the template or it attempts to access a field in a struct which does not exist. A lot of this can be mitigated with static analysis, but it's a lot of work.

grobinson-grafana avatar Nov 16 '23 13:11 grobinson-grafana

Do you still need my pr attached to this issue, or I can abandon it?

Ferrany1 avatar Nov 30 '23 20:11 Ferrany1