cloudwatch_exporter
cloudwatch_exporter copied to clipboard
ALB target group tag selections failing due to resource id
Hi. I fail to get any matching metrics with this configuration (running 0.8.0):
metrics:
- aws_namespace: AWS/ApplicationELB
aws_metric_name: HealthyHostCount
aws_tag_select:
tag_selections:
"ingress.k8s.aws/cluster": [myCluster]
"kubernetes.io/namespace": [myNamespace]
resource_type_selection: "elasticloadbalancing:targetgroup"
resource_id_dimension: TargetGroup
aws_dimensions: [TargetGroup,LoadBalancer]
aws_statistics: [Minimum]
It works if I instead filter by LoadBalancer:
resource_type_selection: "elasticloadbalancing:loadbalancer"
resource_id_dimension: LoadBalancer
From reading the source, it looks to me like the reason for this is that the resource IDs that are extracted from the ARNs don't match the dimension values returned by listMetrics()
in the target group case.
The ResourceARNs returned by aws resourcegroupstaggingapi get-resources ...
look like this for load balancers and target groups, respectively (lightly obfuscated):
"ResourceARN": "arn:aws:elasticloadbalancing:eu-north-1:1234567890:loadbalancer/app/ce9e96ed-somelbname-6b08/3a16156ee3244yyy"
"ResourceARN": "arn:aws:elasticloadbalancing:eu-north-1:1234567890:targetgroup/ce9e96ed-cbe56901daf6f7a1xxx/37def13b297a7yyy"
As far as I can tell, this would result in the following extracted resource IDs:
app/ce9e96ed-somelbname-6b08/3a16156ee3244yyy
ce9e96ed-cbe56901daf6f7a1xxx/37def13b297a7yyy
However, aws list-metrics ...
returns these dimensions:
{
"Namespace": "AWS/ApplicationELB",
"MetricName": "HealthyHostCount",
"Dimensions": [
{
"Name": "TargetGroup",
"Value": "targetgroup/ce9e96ed-cbe56901daf6f7a1xxx/37def13b297a7yyy"
},
{
"Name": "LoadBalancer",
"Value": "app/ce9e96ed-somelbname-6b08/3a16156ee3244yyy"
}
]
}
So the value for the LoadBalancer
dimension matches the extracted ID, but the one for TargetGroup
doesn't (because of the targetgroup/
prefix).
I can't say that I understand why the dimension values look like they do. Perhaps there's some CloudWatch bug there. But on the other hand it doesn't seem clear to me that the current resource ID extraction will work for all ARN flavours given the IMO less than super-clear documentation at https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html.
Could it be an option to just pass around the full ARNs in CloudWatchCollector.java
and consider a metric a match as long as there's an ARN that ends with the dimension value?
@louisfelix do you know why these are differing? Offhand I'd presume a bug on the AWS side.
@brian-brazil no I don't know. I'd also presume an AWS bug here, this is unexpected to me.
Thanks for your attention. I didn't see this before posting, but looks like this is actually the expected naming. From https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html:

So an AWS design flaw then.
Does seem a bit arbitrary, yes.
In this particular case it would work to (1) assume that the dimension value will be a suffix of the ARN instead of extracting the resource ID, as mentioned above. But for full generality I suppose something like (2) a user-provided regex-based mapping from ARN to dimension value would be needed. Alternatively, (3) special cases in the code akin to the one for "AWS/DynamoDB".
So has anyone opened a ticket to AWS about it? Is there a planned workaround in cloudwatch-exporter?
MSK seems to have similar issue with mismatching arn id. As seen in https://docs.aws.amazon.com/msk/latest/developerguide/msk-create-cluster.html a cluster is given a arn id like "arn:aws:kafka:us-east-1:123456789012:cluster/CustomConfigExampleCluster/abcd1234-abcd-dcba-4321-a1b2abcd9f9f-2"
. The random characters postfixed in the arn makes it not match the "Cluster Name" resource dimension (which would be CustomConfigExampleCluster
in this example).
It would be good with an extended syntax for tag selections, so people can work around such problem on their own. @muvster 's 2nd suggestion would seem to solve the original issue, my example and #273.
So sometimes the extraneous data is a suffix, sometimes it's a prefix?
I've also encountered the MSK issue last summer and experienced a similar issue today with for billing causing the inconsistent label names
message. In the later case the exporter was yace .
It would be nice to drop/rewrite before the metric is converted to a prometheus metric.