AwsWatchman
AwsWatchman copied to clipboard
Configure EvaluationPeriods on queue alarms
There can be transient spikes in the "AgeOfOldestMessage" metrics, due to a momentary fault with an upstream dependency. For many queues, this should not cause an alarm.
if they can set the "EvaluationPeriods" to 2, then
- this problem goes away but
- it will take longer to get a genuine alarm (2 periods is 10 minutes)
Should the EvaluationPeriods be increased for all queues or for 1 queue at a time? For both queue alarm types or individually?
We might be able to use percentile statistics e.g. the p95 soon.
You can already do this for the newer alarm types, so suggest we leave this and move sqs over as well.