cloudwatch_exporter icon indicating copy to clipboard operation
cloudwatch_exporter copied to clipboard

ALB target group tag selections failing due to resource id

Open muvster opened this issue 4 years ago • 9 comments

Hi. I fail to get any matching metrics with this configuration (running 0.8.0):

    metrics:             
    - aws_namespace: AWS/ApplicationELB
      aws_metric_name: HealthyHostCount      
      aws_tag_select:                                
        tag_selections:             
          "ingress.k8s.aws/cluster": [myCluster]
          "kubernetes.io/namespace": [myNamespace]
        resource_type_selection: "elasticloadbalancing:targetgroup"
        resource_id_dimension: TargetGroup    
      aws_dimensions: [TargetGroup,LoadBalancer]       
      aws_statistics: [Minimum]

It works if I instead filter by LoadBalancer:

        resource_type_selection: "elasticloadbalancing:loadbalancer"
        resource_id_dimension: LoadBalancer

From reading the source, it looks to me like the reason for this is that the resource IDs that are extracted from the ARNs don't match the dimension values returned by listMetrics() in the target group case.

The ResourceARNs returned by aws resourcegroupstaggingapi get-resources ... look like this for load balancers and target groups, respectively (lightly obfuscated):

"ResourceARN": "arn:aws:elasticloadbalancing:eu-north-1:1234567890:loadbalancer/app/ce9e96ed-somelbname-6b08/3a16156ee3244yyy"
"ResourceARN": "arn:aws:elasticloadbalancing:eu-north-1:1234567890:targetgroup/ce9e96ed-cbe56901daf6f7a1xxx/37def13b297a7yyy"

As far as I can tell, this would result in the following extracted resource IDs:

app/ce9e96ed-somelbname-6b08/3a16156ee3244yyy
ce9e96ed-cbe56901daf6f7a1xxx/37def13b297a7yyy

However, aws list-metrics ... returns these dimensions:

        {                                                                                                                                                                                                   
            "Namespace": "AWS/ApplicationELB",                                                                                                                                                              
            "MetricName": "HealthyHostCount",
            "Dimensions": [
                {
                    "Name": "TargetGroup",
                    "Value": "targetgroup/ce9e96ed-cbe56901daf6f7a1xxx/37def13b297a7yyy"
                },
                {
                    "Name": "LoadBalancer",
                    "Value": "app/ce9e96ed-somelbname-6b08/3a16156ee3244yyy"
                }
            ]
        }

So the value for the LoadBalancer dimension matches the extracted ID, but the one for TargetGroup doesn't (because of the targetgroup/ prefix).

I can't say that I understand why the dimension values look like they do. Perhaps there's some CloudWatch bug there. But on the other hand it doesn't seem clear to me that the current resource ID extraction will work for all ARN flavours given the IMO less than super-clear documentation at https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html.

Could it be an option to just pass around the full ARNs in CloudWatchCollector.java and consider a metric a match as long as there's an ARN that ends with the dimension value?

muvster avatar Apr 24 '20 06:04 muvster

@louisfelix do you know why these are differing? Offhand I'd presume a bug on the AWS side.

brian-brazil avatar Apr 24 '20 10:04 brian-brazil

@brian-brazil no I don't know. I'd also presume an AWS bug here, this is unexpected to me.

louisfelix avatar Apr 25 '20 00:04 louisfelix

Thanks for your attention. I didn't see this before posting, but looks like this is actually the expected naming. From https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html:

alb_monitoring

muvster avatar Apr 25 '20 08:04 muvster

So an AWS design flaw then.

brian-brazil avatar Apr 25 '20 08:04 brian-brazil

Does seem a bit arbitrary, yes.

In this particular case it would work to (1) assume that the dimension value will be a suffix of the ARN instead of extracting the resource ID, as mentioned above. But for full generality I suppose something like (2) a user-provided regex-based mapping from ARN to dimension value would be needed. Alternatively, (3) special cases in the code akin to the one for "AWS/DynamoDB".

muvster avatar Apr 25 '20 11:04 muvster

So has anyone opened a ticket to AWS about it? Is there a planned workaround in cloudwatch-exporter?

eranreshef avatar May 12 '20 08:05 eranreshef

MSK seems to have similar issue with mismatching arn id. As seen in https://docs.aws.amazon.com/msk/latest/developerguide/msk-create-cluster.html a cluster is given a arn id like "arn:aws:kafka:us-east-1:123456789012:cluster/CustomConfigExampleCluster/abcd1234-abcd-dcba-4321-a1b2abcd9f9f-2". The random characters postfixed in the arn makes it not match the "Cluster Name" resource dimension (which would be CustomConfigExampleCluster in this example).

It would be good with an extended syntax for tag selections, so people can work around such problem on their own. @muvster 's 2nd suggestion would seem to solve the original issue, my example and #273.

tokheim avatar Aug 10 '20 23:08 tokheim

So sometimes the extraneous data is a suffix, sometimes it's a prefix?

brian-brazil avatar Aug 11 '20 10:08 brian-brazil

I've also encountered the MSK issue last summer and experienced a similar issue today with for billing causing the inconsistent label names message. In the later case the exporter was yace .

It would be nice to drop/rewrite before the metric is converted to a prometheus metric.

josephreynolds avatar Aug 26 '20 10:08 josephreynolds