cloudwatch_exporter
cloudwatch_exporter copied to clipboard
Dimensions problem with AWS/ApplicationELB:HTTPCode_Target_5XX_Count
Version: prom/cloudwatch-exporter:v0.14.3
Config:
region: eu-west-1 metrics: : :
-
aws_namespace: AWS/ApplicationELB aws_metric_name: HTTPCode_Target_4XX_Count aws_dimensions: [AvailabilityZone, LoadBalancer] aws_statistics: [Sum]
-
aws_namespace: AWS/ApplicationELB aws_metric_name: HTTPCode_Target_5XX_Count aws_dimensions: [AvailabilityZone, LoadBalancer] aws_statistics: [Sum]
Problem: Every combination of dimensions: (AvailabilityZone, LoadBalancer, TargetGroup), each singly, every pair, all three, gives the following log output and the metric does not appear in Prometheus.
Jul 06, 2022 1:26:57 PM org.eclipse.jetty.server.Server doStart INFO: jetty-11.0.9; built: 2022-03-30T17:44:47.085Z; git: 243a48a658a183130a8c8de353178d154ca04f04; jvm 17.0.3+7 Jul 06, 2022 1:26:57 PM org.eclipse.jetty.server.handler.ContextHandler doStart INFO: Started o.e.j.s.ServletContextHandler@14c01636{/,null,AVAILABLE} Jul 06, 2022 1:26:57 PM org.eclipse.jetty.server.AbstractConnector doStart INFO: Started ServerConnector@3a3e78f{HTTP/1.1, (http/1.1)}{0.0.0.0:9106} Jul 06, 2022 1:26:57 PM org.eclipse.jetty.server.Server doStart INFO: Started Server@7b5a12ae{STARTING}[11.0.9,sto=0] @630ms Jul 06, 2022 1:27:06 PM io.prometheus.cloudwatch.CloudWatchCollector listDimensions WARNING: (listDimensions) ignoring metric AWS/ApplicationELB:HTTPCode_Target_4XX_Count due to dimensions mismatch Jul 06, 2022 1:27:06 PM io.prometheus.cloudwatch.CloudWatchCollector listDimensions WARNING: (listDimensions) ignoring metric AWS/ApplicationELB:HTTPCode_Target_5XX_Count due to dimensions mismatch Jul 06, 2022 1:27:25 PM io.prometheus.cloudwatch.CloudWatchCollector listDimensions WARNING: (listDimensions) ignoring metric AWS/ApplicationELB:HTTPCode_Target_4XX_Count due to dimensions mismatch Jul 06, 2022 1:27:25 PM io.prometheus.cloudwatch.CloudWatchCollector listDimensions WARNING: (listDimensions) ignoring metric AWS/ApplicationELB:HTTPCode_Target_5XX_Count due to dimensions mismatch
Any pointers appreciated
Hi @StochasticPirate ,
Please see this comment as a pointer - https://github.com/prometheus/cloudwatch_exporter/issues/432#issuecomment-1149572350
In general - seems like you want to run the following:
aws cloudwatch list-metrics --namespace AWS/ApplicationELB --metric-name HTTPCode_Target_4XX_Count
# and after...
# aws cloudwatch list-metrics --namespace AWS/ApplicationELB --metric-name HTTPCode_Target_5XX_Count
and see which dimension combination interest you.
aws cloudwatch list-metrics --namespace AWS/ApplicationELB --metric-name HTTPCode_Target_4XX_Count and after... aws cloudwatch list-metrics --namespace AWS/ApplicationELB --metric-name HTTPCode_Target_5XX_Count
Yes, thanks, that returns 3 combinations: [LB, AZ], [TG, LB. AZ], [TG, LB].
All of these combinations return the "ignoring metric due to dimensions mismatch" warning.
Interesting... Sharing with you the code here:
do {
requestBuilder.nextToken(nextToken);
ListMetricsResponse response = cloudWatchClient.listMetrics(requestBuilder.build());
cloudwatchRequests.labels("listMetrics", rule.awsNamespace).inc();
for (Metric metric : response.metrics()) {
if (metric.dimensions().size() != dimensionFilters.size()) {
// AWS returns all the metrics with dimensions beyond the ones we ask for,
// so filter them out.
continue;
}
if (useMetric(rule, tagBasedResourceIds, metric)) {
dimensions.add(metric.dimensions());
}
}
nextToken = response.nextToken();
} while (nextToken != null);
if (dimensions.isEmpty()) {
LOGGER.warning(
String.format(
"(listDimensions) ignoring metric %s:%s due to dimensions mismatch",
rule.awsNamespace, rule.awsMetricName));
}
Several options that may cause this problem:
-
response.metrics()
returns empty metric (maybe you got the region wrong? do you see any other metrics using the tool? maybe something is off with credentials?) -
metric.dimensions().size() != dimensionFilters.size()
- maybe you have some hidden characters in your YAML that makes the list size be different? -
useMetric(rule, tagBasedResourceIds, metric)
returns false always - for all returned metrics. Since you didn't specifyawsDimensionSelect
/awsTagSelect
/awsDimensionSelectRegex
- I think this option is less likely.
Tested it locally with this configuration:
region: us-east-2
use_get_metric_data: true
metrics:
- aws_namespace: AWS/ApplicationELB
aws_metric_name: HTTPCode_Target_4XX_Count
aws_dimensions: [AvailabilityZone, LoadBalancer]
aws_statistics: [Sum]
Seems to be working fine.
Thanks for the response. I tried your exact config file (changing only the region to eu-west-1) and still get the following log:
Jul 06, 2022 5:31:47 PM org.eclipse.jetty.server.Server doStart INFO: jetty-11.0.9; built: 2022-03-30T17:44:47.085Z; git: 243a48a658a183130a8c8de353178d154ca04f04; jvm 17.0.3+7 Jul 06, 2022 5:31:47 PM org.eclipse.jetty.server.handler.ContextHandler doStart INFO: Started o.e.j.s.ServletContextHandler@56a4479a{/,null,AVAILABLE} Jul 06, 2022 5:31:47 PM org.eclipse.jetty.server.AbstractConnector doStart INFO: Started ServerConnector@4dd02341{HTTP/1.1, (http/1.1)}{0.0.0.0:9106} Jul 06, 2022 5:31:47 PM org.eclipse.jetty.server.Server doStart INFO: Started Server@60975100{STARTING}[11.0.9,sto=0] @618ms Jul 06, 2022 5:32:05 PM io.prometheus.cloudwatch.CloudWatchCollector listDimensions WARNING: (listDimensions) ignoring metric AWS/ApplicationELB:HTTPCode_Target_4XX_Count due to dimensions mismatch Jul 06, 2022 5:32:24 PM io.prometheus.cloudwatch.CloudWatchCollector listDimensions WARNING: (listDimensions) ignoring metric AWS/ApplicationELB:HTTPCode_Target_4XX_Count due to dimensions mismatch
When you say "working fine" do you mean that you don't see the warning in your logs?
Thanks.
When you say "working fine" do you mean that you don't see the warning in your logs?
yes. No warnings / errors.
Do you have a way to debug the code locally or to double check that you're running against the same account you used when you ran the aws cloudwatch
CLI?
Definitely running against the same account.
On 6 Jul 2022, at 18:59, Or Shachar @.***> wrote:
When you say "working fine" do you mean that you don't see the warning in your logs?
yes. No errors.
Do you have a way to debug the code locally or to double check that you're running against the same account you used when you ran the aws cloudwatch CLI?
— Reply to this email directly, view it on GitHub https://github.com/prometheus/cloudwatch_exporter/issues/442#issuecomment-1176518038, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ6FZRFMJD5QM6NPRV5S23LVSXCQFANCNFSM52Z2WYRA. You are receiving this because you were mentioned.
I can try to create a new version with a little more debug information to understand where exactly is the problem. But it may take few days... alternatively - do you have a way to run the application in debug mode?
Thanks. I was going to have a look today to see whether there is a way to create more debugging output. Also I was going to look at permissions, although the application is able to create some other metrics ok. It seems to be the [2|3|4|5]XX metrics that it struggles with. If you could produce a version with some extra debugging that'd be great, thankyou 👍
I am running in similar kind of issue here but for AWS/Transfer service. What I observed is until I don't push file to SFTP server, I kept getting these warning for all the Metrics under AW/Transfer namespace but as soon as I push something to Transfer service and as soon as AWS Cloudwatch Metrics got something to show up, these warnings disappear.
- aws_namespace: AWS/Transfer
aws_metric_name: BytesIn
aws_dimensions: [ServerId]
aws_tag_select:
tag_selections:
"tl:prefix": ["dev"]
resource_type_selection: "transfer:server"
resource_id_dimension: ServerId
aws_statistics: [Sum]
use_get_metric_data: true
period_seconds: 360
- aws_namespace: AWS/Transfer
aws_metric_name: BytesOut
aws_dimensions: [ServerId]
aws_tag_select:
tag_selections:
"tl:prefix": ["dev"]
resource_type_selection: "transfer:server"
resource_id_dimension: ServerId
aws_statistics: [Sum]
use_get_metric_data: true
period_seconds: 540
- aws_namespace: AWS/Transfer
aws_metric_name: FilesIn
aws_dimensions: [ServerId]
aws_tag_select:
tag_selections:
"tl:prefix": ["dev"]
resource_type_selection: "transfer:server"
resource_id_dimension: ServerId
aws_statistics: [Sum]
use_get_metric_data: true
period_seconds: 420
- aws_namespace: AWS/Transfer
aws_metric_name: FilesOut
aws_dimensions: [ServerId]
aws_tag_select:
tag_selections:
"tl:prefix": ["dev"]
resource_type_selection: "transfer:server"
resource_id_dimension: ServerId
aws_statistics: [Sum]
use_get_metric_data: true
period_seconds: 480
- aws_namespace: AWS/Transfer
aws_metric_name: OnUploadExecutionsStarted
aws_dimensions: [ServerId]
aws_tag_select:
tag_selections:
"tl:prefix": ["dev"]
resource_type_selection: "transfer:server"
resource_id_dimension: ServerId
aws_statistics: [Sum]
use_get_metric_data: true
period_seconds: 600
- aws_namespace: AWS/Transfer
aws_metric_name: OnUploadExecutionsFailed
aws_dimensions: [ServerId]
aws_tag_select:
tag_selections:
"tl:prefix": ["dev"]
resource_type_selection: "transfer:server"
resource_id_dimension: ServerId
aws_statistics: [Sum]
use_get_metric_data: true
period_seconds: 660
- aws_namespace: AWS/Transfer
aws_metric_name: OnUploadExecutionsSuccess
aws_dimensions: [ServerId]
aws_tag_select:
tag_selections:
"tl:prefix": ["dev"]
resource_type_selection: "transfer:server"
resource_id_dimension: ServerId
aws_statistics: [Sum]
use_get_metric_data: true
period_seconds: 720
But in this also, if any of these metrics are not having any data to show up, then also we get these warnings for that particular metrics. For ex: in my case, if there is no error in the Transfer Service then we don't get any data in OnUploadExecutionsFailed metric and due to this I keep getting these warnings:
2022-07-13T09:37:32+05:30 Jul 13, 2022 4:07:32 AM io.prometheus.cloudwatch.CloudWatchCollector listDimensions 2022-07-13T09:37:32+05:30 WARNING: (listDimensions) ignoring metric AWS/Transfer:OnUploadExecutionsFailed due to dimensions mismatch
@or-shachar Any suggestion how to resolve this?
Interesting... I guess that's exactly the case I was curious about when I added the warning feature.
In my experience missing dimensions == some kind of configuration error that needs to be reported to the logs.
But IIUC there are some cases where the metric is not reported continuously and then the metric would be missing by design.
We can maybe add a flag to suppress those warnings for certain metrics. WDYT? @matthiasr
We are also hitting this "issue". Our YAML has over 3500 lines, and we have hundreds of AWS accounts - each one with unique combination of services. On each one we are getting different warnings (probably because different services are in use).
@or-shachar Also, is there a way to stop getting these Warnings:
WARNING: CloudWatch scrape failed
software.amazon.awssdk.services.cloudwatch.model.CloudWatchException: Rate exceeded (Service: CloudWatch, Status Code: 400, Request ID: aaa68484-bf4e-4fca-8d82-93d53b9134e2)
I have updated period_seconds different for all the metrics but still getting these Warning. Any suggestion?
I think we will have to release a new version with the warning feature turned off until we can figure this out...
I'll try to get a PR ready for this during the weekend
I think we will have to release a new version with the warning feature turned off until we can figure this out...
I'll try to get a PR ready for this during the weekend
@or-shachar will this resolve below Warnings as well:
WARNING: CloudWatch scrape failed
software.amazon.awssdk.services.cloudwatch.model.CloudWatchException: Rate exceeded (Service: CloudWatch, Status Code: 400, Request ID: aaa68484-bf4e-4fca-8d82-93d53b9134e2)
Or could you please suggest anything which can help to resolve this?
@avizvaRumit This is something else related to Cloudwatch service quotas.
Probably unrelated to this issue. I suggest looking in the official Cloudwatch docs.
start a different thread for it. Please kindly describe there the amount of metrics you are scraping if you can and whether you use the use_get_metric_data
option 🙏
Sorry to bug you about this. Are there any plans to move forward with this?
Up! Have the same problem with the AWS/ApplicationELB:HTTPCode_Target_5XX_Count metric.
WARNING: (listDimensions) ignoring metric AWS/ApplicationELB:HTTPCode_Target_5XX_Count due to dimensions mismatch
Other metrics like HTTPCode_Target_4XX_Count, HTTPCode_Target_3XX_Count... works fine.
Hi, my sincere apologies... been sucked into other stuff. I hope I'll get to finalize the work on it this weekend 🙏
Up! Have the same problem with the AWS/ApplicationELB:HTTPCode_Target_5XX_Count metric.
WARNING: (listDimensions) ignoring metric AWS/ApplicationELB:HTTPCode_Target_5XX_Count due to dimensions mismatch
Other metrics like HTTPCode_Target_4XX_Count, HTTPCode_Target_3XX_Count... works fine.
The same issue is also for AWS/ApplicationELB:HTTPCode_ELB_5XX_Count metric.
Yeah, same issue here. I just assumed it was because there were no 5XX metrics to collect. But, I guess it seems to be more than that.
Yeah, same issue here. I just assumed it was because there were no 5XX metrics to collect. But, I guess it seems to be more than that.
So, I dug into this a bit more this weekend, and I think this is just a log message question, as discussed earlier in the thread. Once I was able to generate 5XX errors for the metrics to generate, they made their way to Prometheus with no problems. For reference, I'm using OpenJDK 11 and cloudwatch_exporter-0.15.0-jar-with-dependencies.jar.
I've completed this PR which opts out this warning message. Our first assumption is that ListMetric must always return some metrics but there are some cases where this isn't true - in which this warning is redundant
As of 0.15.1, this warning is now configurable.