non business hours alert resolved status not acknowledged by alertmanager and not sent to configured receiver
Hello everyone,
Please help me understand whether I misconfigured the Prometheus' Alertmanager in any way.
The scenario is the following: If the alert is triggered during business hours, the notification is being sent . If the alert is triggered during non business hours, the notification is not being sent .
If the alert is resolved during non business hours (in Prometheus), the event is not acknowledged by alertmanager and therefore the resolved status is not being sent towards the configured receiver (PagerDuty, in this case).
Please help me understand where the issue is coming from.
What did you do? Configured altermanager to send an alert only during business hours interval configured in alertmanager.yml
time_intervals:
- name: only_in_business_hours
time_intervals:
- weekdays: ['monday:friday']
times:
- start_time: "07:00"
end_time: "16:00"
- name: weekend
time_intervals:
- weekdays: ['saturday','sunday']
Below there is the alert rule for business hours
- name: ssl_certificate_expiry
rules:
- alert: cert_expiring_date
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 7
for: 10m
labels:
severity: warning
only_in_business_hours: true
annotations:
summary: The SSL certificate will expire on {{ $labels.instance }}
description: "SSL certificate on target will expire in less than 1 week."
What did you expect to see?
If an alert is triggered during non business hours, the alert is not sent and it waits until business hours begin. If the alert is resolved during non business hours, the notification should be sent to the configured receiver.
What did you see instead? Under which circumstances?
If the alert is resolved during non business hours (in Prometheus), the event is not acknowledged by alertmanager and therefore the resolved status is not being sent towards the configured receiver (PagerDuty, in this case).
Environment
- System information:
Linux 3.10.0-1160.31.1.el7.x86_64 x86_64
- Alertmanager version:
alertmanager, version 0.26.0 (branch: HEAD, revision: d7b4f0c7322e7151d6e3b1e31cbc15361e295d8d)
- Prometheus version:
prometheus, version 2.40.3 (branch: HEAD, revision: 84e95d8cbc51b89f1a69b25dd239cae2a44cb6c1)
- Alertmanager configuration file:
global:
resolve_timeout: 3m
route:
group_by: ['alertname', 'cluster', 'service', 'url']
group_wait: 30s
group_interval: 2m
repeat_interval: 3h
receiver: 'pagerduty_channel'
routes:
- matchers:
- only_in_business_hours = true
continue: true
active_time_intervals:
- only_in_business_hours
receivers:
- name: "pagerduty_channel"
pagerduty_configs:
- routing_key: "aBeautifulAndColorfulKey"
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'cluster', 'service']
time_intervals:
- name: only_in_business_hours
time_intervals:
- weekdays: ['monday:friday']
times:
- start_time: "07:00"
end_time: "16:00"
- name: weekend
time_intervals:
- weekdays: ['saturday','sunday']
- Prometheus configuration file:
global:
scrape_interval: 2s
evaluation_interval: 2s
query_log_file: /prometheus/logs/query.log
rule_files:
- "alert.rules"
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
alerting:
alertmanagers:
- scheme: 'http'
static_configs:
- targets:
- 'localhost:9093'
- Logs:
40007186:ts=2023-11-15T20:24:46.720Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup="{}/{only_in_business_hours=\"true\"}:{alertname=\"cert_expiring_date\", url=\"https://address.net/\"}" msg=flushing alerts=[cert_expiring_date[0308b61][resolved]]
40007453-ts=2023-11-15T20:24:46.720Z caller=notify.go:877 level=debug component=dispatcher msg="Notifications not sent, route is not within active time"
Alertmanager is working as intended. If a route's active_time_intervals do not match, that route will not be active - neither to send a "firing" notification, nor to send a "resolved" notification.
And if I want to send a notification to the configured receiver (when the alert is resolved outside of active_time_intervals), how can I achieve that ? Thank you.
Hi! 👋 I do not believe it's possible to tell Alertmanager to send resolved notifications for alerts that are silenced, muted or outside active time intervals. Someone else might be able to correct me if this is wrong.
Generally this type of thing is better configured in your notification provider, e.g. PagerDuty, OpsGenie etc, since that's where you configure your teams, on-call schedules, escalation rules etc. Just let Alertmanager blast everything through to PagerDuty (regardless of time / day), and configure your custom notification behaviour there.