CloudWatch AmazonCloudWatchClient->executePutMetricData silently fails when there's an SDK version conflict
I was debugging an issue where my metrics weren't being uploaded to CloudWatch (TL;DR, I wasn't using the BOM in order to ensure I have compatible SDK versions). It would have been really useful if the SDK was logging something or throwing an exception letting me know it cannot upload the metrics data.
Describe the Feature
Allow developers that didn't know to use aws-java-sdk-bom to see their mistake faster. Achieve that by being more verbose whenever related unchecked exceptions are thrown.
Is your Feature Request related to a problem?
If the service is setup to publish metrics, you deploy it and the metrics aren't there, I think most developers would first look into their infrastructure and think the metrics cannot be sent. But in reality, the request cannot be created. If there was a log message saying so, it would save hours of debugging potentially.
Proposed Solution
In AmazonCloudWatchClient->executePutMetricData we see the following code:
try {
request = (new PutMetricDataRequestMarshaller()).marshall((PutMetricDataRequest)super.beforeMarshalling(putMetricDataRequest));
request.setAWSRequestMetrics(awsRequestMetrics);
request.addHandlerContext(HandlerContextKey.CLIENT_ENDPOINT, this.endpoint);
[...]
request.addHandlerContext(HandlerContextKey.ADVANCED_CONFIG, this.advancedConfig);
} finally {
awsRequestMetrics.endEvent(Field.RequestMarshallTime);
}
In my case an exception was being thrown in the line:
request.addHandlerContext(HandlerContextKey.CLIENT_ENDPOINT, this.endpoint);
and it was being swallowed somewhere in the chain, not even logging that there was an issue. I would catch that exception and log something so developers know that:
- The metrics data failed to get uploaded
- There an AWS SDK version conflict
Of course, that's just one place where such an exception is swallowed. I don't know how many there are in reality. And I'm not sure this can be fixed globally.
Additional Context
I was wrongfully trying to trace the issue to a permissions one - I thought my FARGATE instances weren't allowed to push metrics to CloudWatch. Then I was trying to verify that the service was even trying to publish metrics that, which led me to adding some breakpoints in the metrics uploader and seeing that there was an exception thrown before even sending the request.
Your Environment
- AWS Java SDK version used:
- Before fixing the issue:
- SQS: 1.11.929
- S3: 1.11.946
- micrometer-registry-cloudwatch: 1.7.0
- Before fixing the issue:
- JDK version used: 11