terraform-datadog-platform
terraform-datadog-platform copied to clipboard
Targeted notifications
Describe the Bug
Generally you would have your notification channels within a conditional variable so your message might be
An alert
{{#is_warning}}
It is warning @slacknotify
{{/is_warning}}
{{#is_alert}}
It is alerting @pagerduty
{{/is_alert}}
In this way you would only get paged if something was alerting. Currently if you use alert_tags it looks like the alerting would just be for any condition the alert hits. You can override the message so this is okay but it would be nice to be able to do this more automatically
Expected Behavior
- Some kind of way of configuring message for each conditional variable - https://docs.datadoghq.com/monitors/notifications/?tab=is_alert#conditional-variables
What we have is basically a terraform implementation for the config format used by astro. https://github.com/FairwindsOps/astro
If you have some pseudo code for how you would like to express it, might make it easier to discuss a potential solution.
I think it might be complicated to have something to template all conditionals especially some of the more complicate stuff like at the bottom of https://docs.datadoghq.com/monitors/notifications. e.g. it matches a host name
{{{{is_match "host.name" "<HOST_NAME>"}}}}
{{ .matched }} the host name
{{{{/is_match}}}}
For us at least we keep it pretty simple and each alert just notifies on warn and alert as per above. Perhaps you could create a default message, adding the following to the object:
message_description
warning_notification
alert_notification
Then the message would either be message (assuming not blank) or it generates this:
${var.message_description}
{{#is_warning}}
${var.warning_notification}
{{/is_warning}}
{{#is_alert}}
${var.alert_notification}
{{/is_alert}}
However at the moment, since I'm not doing yaml I'm just going to merge a default object
default_datadog_monitors_obj = {
name = ""
type = ""
message = <<EOT
{{#is_warning}} @slack {{/is_warning}}
{{#is_alert}} @pagerduty {{/is_alert}}
{{#is_recovery}} @pagerduty {{/is_recovery}}
EOT
query = ""
escalation_message = ""
tags = []
notify_no_data = false
notify_audit = false
require_full_window = false
enable_logs_sample = false
force_delete = true
include_tags = true
locked = false
renotify_interval = 0
timeout_h = 60
evaluation_delay = 0
new_host_delay = 300
no_data_timeframe = 10
threshold_windows = {}
thresholds = {}
}
Adding a monitor
datadog_monitors = {
test-monitor = merge(
local.default_datadog_monitors_obj,
{
name = "test monitor"
type = "service check"
query = "'datadog.agent.up'.over('*').last(3).count_by_status()"
notify_no_data = true
thresholds = {
ok = 3
critical = 3
warning = 3
}
}
)
If we did separate it for each, it would look something like this
module "datadog_monitors" {
source = "cloudposse/platform/datadog//modules/monitors"
version = "0.24.1"
# ...
# Existing would be repurposed for {{#is_alert}}
alert_tags = var.alert_tags
alert_tags_separator = var.alert_tags_separator
# For specific notifications
no_data_tags = var.no_data_tags
warning_tags = var.warning_tags
recovery_tags = var.recovery_tags
warning_recovery_tags = var.warning_recovery_tags
alert_recovery_tags = var.alert_recovery_tags
alert_to_warning_tags = var.alert_to_warning_tags
no_data_recovery_tags = var.no_data_recovery_tags
warning_recovery_tags = var.warning_recovery_tags
or perhaps
module "datadog_monitors" {
source = "cloudposse/platform/datadog//modules/monitors"
version = "0.24.1"
# ...
alert_tags = {
is_alert = lookup(var.alert_tags, "is_alert", null)
is_no_data = lookup(var.alert_tags, "is_no_data", null)
is_warning = lookup(var.alert_tags, "is_warning", null)
is_recovery = lookup(var.alert_tags, "is_recovery", null)
is_warning_recovery = lookup(var.alert_tags, "is_warning_recovery", null)
is_alert_recovery = lookup(var.alert_tags, "is_alert_recovery", null)
is_alert_to_warning = lookup(var.alert_tags, "is_alert_to_warning", null)
is_no_data_recovery = lookup(var.alert_tags, "is_no_data_recovery", null)
is_warning_recovery = lookup(var.alert_tags, "is_warning_recovery", null)
}
For now, it may be easier to just overwrite the message argument like you did above.
I believe that the message override is probably the best path forward and leaving alert_tags empty. While that is maybe less pretty, it follows DD's pattern and is simple enough to add with how you've demo'd along with in yaml something like
...
message: |
{{#is_warning}} @slack {{/is_warning}}
{{#is_alert}} @pagerduty {{/is_alert}}
{{#is_recovery}} @pagerduty {{/is_recovery}}
is simple enough.
Always happy to discuss possible alternatives though!
I am closing this as wontfix since the OP seems to have decided the existing implementation is sufficient.