asgard icon indicating copy to clipboard operation
asgard copied to clipboard

Unidentifiable alarm names when using Manage Cluster of Sequential ASGs

Open t0mpson opened this issue 11 years ago • 3 comments

When using Manage Cluster of Sequential ASGs, asgard creates a nicely named autoscaling group with a sequential identifier, and also for scaling policy.

However, the alarm names are not maintained and new alarms are called "alarm-$id" , for example "alarm-160".

This leads to problems identifying which alarm relates to which autoscaling group when looking at cloudwatch alarms section. (Not to mention SNS notifications).

For example, lets say we have an autoscaling group:

  • name: example-autoscaling-group-000
  • scale up policy: example-scale-up-000
  • scale down policy: example-scale-down-000
  • scale up alarm name: example-scale-up-alarm-000
  • scale down alarm name: example-scale-down-alarm-000

After using Manage Cluster of Sequential ASGs, here's the resulting next sequential ASG and related policy and alarm names:

  • name: example-autoscaling-group-001
  • scale up policy: example-scale-up-001
  • scale down policy: example-scale-down-001
  • scale up alarm name: alarm-201
  • scale down alarm name: alarm-202

The scale up/down alarm names should also be :

  • scale up alarm name: example-scale-up-alarm-001
  • scale down alarm name: example-scale-down-alarm-001

t0mpson avatar Nov 10 '13 17:11 t0mpson

The last assertion about what the alarms names should be is incorrect. The system is specifically designed not to do that. The names are deliberate. If your alarm has a dimension of type 'AutoScalingGroupName' then it will be used in the name instead of 'alarm'. All alarms and scaling policies are appended with a sequence number to ensure uniqueness. Keep in mind that you can have more than just two policies for an alarm(scale up and down) and you can have multiple alarms per scaling policy. Are your particular alarms missing a 'AutoScalingGroupName' dimension? What metric are they using (perhaps it is one that does not use AutoScalingGroupName dimensions). You should see something like 'example-scale-up-alarm-001-201' if an AutoScalingGroupName dimension is available.

Rather than making assumptions about expected Alarm names how about we talk about the user experience complaints. "This leads to problems identifying which alarm relates to which autoscaling group when looking at cloudwatch alarms section. (Not to mention SNS notifications)."

On the list of alarms, the value of the dimensions are displayed in the table. Most alarms will have a AutoScalingGroupName there. I'm wondering if your alarms were created in a way that they do not have this dimension. I can't make a similar guess as to what the SNS notification issue is.

Alarms are very far removed from an ASG. A related SNS topic is even further. And the reference work the opposite way that you are wanting to walk. It is easy to walk this way: ASG -> Scaling Policy -> Alarm - Topic

But conceptually difficult to go the other way. The Topic has no idea that it is used in an alarm, that is referenced by a scaling policy for a particular ASG. We could make a new screen that took all this info from our caches and displayed this graph in a different way. But we need to know more about the use case.

claymccoy avatar Nov 10 '13 19:11 claymccoy

You are right. I'll try to explain the user experience / requirements.

The alarm I use is on the load balancer connected to the ASG. It has the dimension "LoadBalancerName" and I scale up/down according to latency.

As ASGs & load balancers can be tightly integrated, I would imagine that a lot of people are using this dimension in order to ensure proper service - basically ensuring that latency is under a certain required threshold , and scaling up in case the latency is higher than required. Does that make sense?

I understand that walking back from Topic->Alarm->Scaling Policy ASG is conceptually difficult. But i think in this case only the other way around is needed. (Looking at the current ASG -> Scaling Policy -> Alarm - Topic and sequentially incrementing each one.

t0mpson avatar Nov 11 '13 11:11 t0mpson

Has anyone ever found a workaround for this? In my case, I'm trying to scale ASGs based on ELB latency (exact same scenario as @t0mpson) and SQS size.

If you don't add the AutoScalingGroupName dimension the alarm will be propagated correctly to the next ASG but will not be named correctly, hence the alarm-NNN format. If you do add the AutoScalingGroupName dimension to the alarm it will named and propagated correctly but will not return any data. As far as I can tell AutoScalingGroupName only applies to AWS/EC2. I tried this for an SQS alarm as well as an ELB alarm. I was also unable to find a way to do this outside of using the CLI tools.

In any event, I found the Create New Scaling Policy UI very confusing as you can specify all the built-in CloudWatch metrics as the alarm but you are unable to specify dimensions. For example, for the AWS/SQS - ApproximateNumberOfMessagesVisible metric you need to specify the QueueName dimension for it to be meaningful.

For now, I'll just live with having a bunch of alarms named alarm-NNN.

Edit: Just realize that if I click into the alarm created on the Create New Scaling Policy page you can indeed enter the QueueName dimension for SQS. Derp.

richid avatar Oct 21 '14 20:10 richid