terraform-aws-cloudtrail-cloudwatch-alarms
terraform-aws-cloudtrail-cloudwatch-alarms copied to clipboard
Adding Anomaly Detection Functionality
Have a question? Please checkout our Slack Community or visit our Slack Archive.
Describe the Feature
CloudWatch has anomaly detection functionality for alarms that is not yet accounted for in the aws_cloudwatch_metric_alarm resource. The feature requested is the option to add anomaly detection to a metric alarm.
Expected Behavior
Clouddrove has this functionality, the usage is as such:
Basic Example
module "alarm" {
source = "clouddrove/cloudwatch-alarms/aws"
version = "1.0.1"
name = "alarm"
environment = "test"
label_order = ["name", "environment"]
alarm_name = "cpu-alarm"
comparison_operator = "LessThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "60"
statistic = "Average"
threshold = "40"
alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = ["arn:aws:sns:eu-west-1:xxxxxxxxxxx:test"]
actions_enabled = true
insufficient_data_actions = []
ok_actions = []
dimensions = {
instance_id = "i-xxxxxxxxxxxxx"
}
}
Anomaly Example
module "alarm" {
source = "clouddrove/cloudwatch-alarms/aws"
version = "1.0.1"
name = "alarm"
environment = "test"
label_order = ["name", "environment"]
alarm_name = "cpu-alarm"
comparison_operator = "GreaterThanUpperThreshold"
evaluation_periods = 2
threshold_metric_id = "e1"
query_expressions = [{
id = "e1"
expression = "ANOMALY_DETECTION_BAND(m1)"
label = "CPUUtilization (Expected)"
return_data = "true"
}]
query_metrics = [{
id = "m1"
return_data = "true"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
stat = "Average"
unit = "Count"
dimensions = {
InstanceId = module.ec2.instance_id[0]
}
}]
alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = []
actions_enabled = true
insufficient_data_actions = []
ok_actions = []
}
Expression Example
module "alarm" {
source = "clouddrove/cloudwatch-alarms/aws"
version = "1.0.1"
name = "alarm"
environment = "test"
label_order = ["name", "environment"]
expression_enabled = true
alarm_name = "cpu-alarm"
comparison_operator = "GreaterThanUpperThreshold"
evaluation_periods = 2
threshold = 40
query_expressions = [{
id = "e1"
expression = "ANOMALY_DETECTION_BAND(m1)"
label = "CPUUtilization (Expected)"
return_data = "true"
}]
query_metrics = [{
id = "m1"
return_data = "true"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
stat = "Average"
unit = "Count"
dimensions = {
InstanceId = module.ec2.instance_id[0]
}
}]
alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = []
actions_enabled = true
insufficient_data_actions = []
ok_actions = []
}
Use Case
We want to add anomaly detection to our CIS benchmark alarms because we are often spammed with email alerts about insignificant issues and don't end up reading any of the 100's of alert emails we receive.
Describe Ideal Solution
I am not super familiar with Terraform, but I would expect the solution in alarms.tf to look something like this:
resource "aws_cloudwatch_metric_alarm" "default" {
for_each = local.enabled ? var.metrics : {}
alarm_name = each.value.alarm_name
comparison_operator = each.value.alarm_comparison_operator
evaluation_periods = each.value.alarm_evaluation_periods
metric_name = each.value.metric_name
namespace = each.value.metric_namespace
period = each.value.alarm_period
statistic = each.value.alarm_statistic
treat_missing_data = each.value.alarm_treat_missing_data
threshold = each.value.alarm_threshold
alarm_description = each.value.alarm_description
alarm_actions = local.endpoints
tags = module.this.tags
dynamic "metric_query" {
for_each = var.query_expressions
content {
id = metric_query.value.id
expression = metric_query.value.expression
label = metric_query.value.label
return_data = metric_query.value.return_data
}
}
dynamic "metric_query" {
for_each = var.query_metrics
content {
id = metric_query.value.id
return_data = metric_query.value.return_data
metric {
metric_name = metric_query.value.metric_name
namespace = metric_query.value.namespace
period = metric_query.value.period
stat = metric_query.value.stat
unit = metric_query.value.unit
dimensions = metric_query.value.dimensions
}
}
}
}
Alternatives Considered
We have thought about adjusting the each metrics' static threshold value, but we have multiple accounts that require multiple thresholds and these thresholds are expected to change over time. Anomaly detection would be useful to monitor alert anomalies dynamically in different environments.
Additional Context
We wanted to utilize CW alarm anomaly detection for CIS benchmark alerts across several accounts and noticed that we cannot add it in our IaaS because this module does not have the variables listed in the backend. Clouddrove seems to have the functionality we are after, but we prefer using CloudPosse because of the listed security standard compliance and for the good documentation. We hope this issue can be resolved in the future because anomaly detection is a useful feature for CW monitoring!