alertmanager
alertmanager copied to clipboard
Alertmanager Fails to Start with Configuration Error for Telegram Integration
Description
I've set up Prometheus with Alertmanager to send alerts to a Telegram chat. However, Alertmanager fails to start, reporting a configuration error related to the chat_id and text fields in the Telegram configuration. Environment
Prometheus version: v2.30.3
Alertmanager version: v0.26.0
Docker version: (include your docker version here)
Operating System: (include your OS here)
Configuration
`` I have the following setup in my docker-compose.yml:
version: '3'
services:
prometheus:
image: prom/prometheus:v2.30.3
container_name: prometheus
volumes:
- $PWD/prometheus:/etc/prometheus
- prometheus_data:/prometheus
- $PWD/alertmanager:/etc/alertmanager
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
ports:
- 9091:9090
restart: always
alertmanager:
image: prom/alertmanager:v0.26.0
container_name: alertmanager
volumes:
- ./alertmanager:/etc/alertmanager
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--storage.path=/alertmanager'
ports:
- 9093:9093
restart: always
elasticsearch-exporter-1:
image: quay.io/prometheuscommunity/elasticsearch-exporter:latest
command:
- '--es.uri=https://elastic:[email protected]:9202'
- '--es.all'
- '--es.ssl-skip-verify'
ports:
- "9114:9114"
restart: always
elasticsearch-exporter-2:
image: quay.io/prometheuscommunity/elasticsearch-exporter:latest
command:
- '--es.uri=https://elastic:[email protected]:9201'
- '--es.all'
- '--es.ssl-skip-verify'
ports:
- "9115:9114"
restart: always
volumes:
prometheus_data:
And my alertmanager/config.yml is configured as follows :
global:
resolve_timeout: 1m
route:
receiver: 'telegram'
group_by: ['alertname', 'cluster']
repeat_interval: 1h
receivers:
- name: 'telegram'
telegram_configs:
- send_resolved: true
api_url: 'https://api.telegram.org/bot<TOKEN>/sendMessage'
chat_id: '-100xxxxxx'
text: |
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }}{{ .Name }}: {{ .Value }}
{{ end }}
{{ end }}
(Note: <TOKEN> and <CHAT_ID> are placeholders for the actual bot token and chat ID.) Error
Upon starting Alertmanager, it fails with the following error:
yaml: unmarshal errors:
line 14: cannot unmarshal !!str `-100169...` into int64
line 15: field text not found in type config.plain
Here is whole docker container logs for alertmanager :
ts=2024-03-12T12:13:34.202Z caller=main.go:246 level=info build_context="(go=go1.20.7, platform=linux/amd64, user=root@df8d7debeef4, date=20230824-11:11:58, tags=netgo)"
ts=2024-03-12T12:13:34.203Z caller=cluster.go:186 level=info component=cluster msg="setting advertise address explicitly" addr=192.168.48.5 port=9094
ts=2024-03-12T12:13:34.205Z caller=cluster.go:683 level=info component=cluster msg="Waiting for gossip to settle..." interval=2s
ts=2024-03-12T12:13:34.262Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config.yml
ts=2024-03-12T12:13:34.262Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config.yml err="yaml: unmarshal errors:\n line 14: cannot unmarshal !!str `-100169...` into int64\n line 15: field text not found in type config.plain"
ts=2024-03-12T12:13:34.262Z caller=cluster.go:692 level=info component=cluster msg="gossip not settled but continuing anyway" polls=0 elapsed=57.276658ms
prometheus/alertmanager.rules
groups:
- name: elasticsearch_alerts
rules:
- alert: LowElasticsearchNodes
expr: elasticsearch_cluster_health_number_of_nodes{job="your_job_name", instance=~"your_instance_regex", cluster="your_cluster_name"} < 3
for: 5m
labels:
severity: critical
annotations:
summary: "Low number of Elasticsearch nodes ({{ $labels.cluster }})"
description: "Elasticsearch cluster '{{ $labels.cluster }}' has fewer than 3 nodes for more than 5 minutes."
prometheus/prometheus.yml
global:
scrape_interval: 15s
rule_files:
- "/root/Prometheus/prometheus/lertmanager.rules"
scrape_configs:
- job_name: 'elasticsearch-1'
metrics_path: '/metrics'
static_configs:
- targets: ['172.31.63.2:9114']
- job_name: 'elasticsearch-2'
metrics_path: '/metrics'
static_configs:
- targets: ['172.31.63.2:9115']
tree .
.
├── alertmanager
│ └── config.yml
├── docker-compose.yml
└── prometheus
├── alertmanager.rules
└── prometheus.yml
It appears there's an issue with parsing the chat_id (which is a negative number for Telegram groups) and recognizing the text field in the Telegram configuration. Steps to Reproduce
Set up Prometheus and Alertmanager with the above configurations.
Start the services using Docker Compose.
Observe the logs of the Alertmanager container.
Expected Behavior
Alertmanager starts successfully and is able to send alerts to the configured Telegram chat. Actual Behavior
Alertmanager fails to start due to a configuration parsing error related to the Telegram integration.
By using chat_id: '-100xxxxxx'
(single quotes) you are making this a string explicitly