aws-otel-collector
aws-otel-collector copied to clipboard
High usage of AWS Cloudwatch Metrics after enabling Default AWSOTEL
After enabling the AWS OTEL Collector in 4 EC2 Linux Tomcat WebServers, with all default configuration, the AWS CloudWatch bill regarding "metric-month" is $250 in 3 days with more than 800.000 metrics.
Why there are so many metrics being sent to Cloudwatch? How can we lower this number?
All the configuration is by default. AWS Xray sampling is default (1req/s, 5%).
Environment Default AWS Otel Collector Configuration. (https://aws-otel-collector.s3.amazonaws.com/amazon_linux/amd64/latest/aws-otel-collector.rpm) AWS EC2 Linux2 AMI.
Hi @ggallotti. Can you help me further debug your problem?
-
What installation method did you use to install the collector in your instance? Was this one? https://aws-otel.github.io/docs/setup/ec2#install-aws-distro-for-opentelemetry-collector-on-ec2-instance-using-cloudformation (This is relevant for finding out what defaults are being used in your case).
-
Are you using Java Auto instrumentation with your Tomcat Application?
-
Can you provide more details about the metrics that you are seeing? Are they related to Tomcat or to your application?
Finally, you can change the collector configuration to only enable specific metrics using the metric_declaration
property in the awsemfexporter. (this is what is used to create the metrics collected from your application to CloudWatch).
If you have enterprise support, can you please submit a ticket?
Hi @rapphil and thanks for the Response. Sure, I will upload a Ticket. Just in case I'm completing the missing information. I would try to change the configuration of the awsemfexporter, thanks for the tip.
The WebApplications is accessed by thousands of users. Maybe the metrics are proportionally to the RateLimit of XRay configuration. I don't know if the Ratelimit applies to Metrics.
Collector Installation (inside same EC2 where Tomcat is running):
wget https://aws-otel-collector.s3.amazonaws.com/amazon_linux/amd64/latest/aws-otel-collector.rpm
rpm -Uvh ./aws-otel-collector.rpm
/opt/aws/aws-otel-collector/bin/aws-otel-collector-ctl -a start
Instrumentation Yes, Java Tomcat Auto instrumentation. (Everything by default)
wget https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar -O /opt/aws-opentelemetry-agent.jar
export CATALINA_OPTS="$CATALINA_OPTS -javaagent:/opt/aws-opentelemetry-agent.jar"
AWS CloudWatch Logs (/metrics/ServiceNamespace/Name)
There are thousands of:
{
"OTelLib": "io.opentelemetry.tomcat-7.0",
"_aws": {
"CloudWatchMetrics": [
{
"Namespace": "com.namespace/AppName",
"Dimensions": [
[
"http.status_code",
"OTelLib",
"http.flavor",
"http.host",
"http.method",
"http.route",
"http.scheme"
],
[
"OTelLib"
],
[
"OTelLib",
"http.flavor"
],
[
"OTelLib",
"http.host"
],
[
"OTelLib",
"http.method"
],
[
"OTelLib",
"http.route"
],
[
"OTelLib",
"http.scheme"
],
[
"OTelLib",
"http.status_code"
]
],
"Metrics": [
{
"Name": "http.server.duration",
"Unit": "Milliseconds"
}
]
}
],
"Timestamp": 1665493994020
},
"http.flavor": "1.1",
"http.host": "hostname.com",
"http.method": "GET",
"http.route": "/v1/rest/FXTournamentOverview_Level_Detail",
"http.scheme": "http",
"http.server.duration": {
"Max": 1.483377,
"Min": 0.770689,
"Count": 54,
"Sum": 53.263889
},
"http.status_code": "304"
}
Based on your comment, have you considered reducing the number of dimensions, in case you are not using all of them? Keep in mind that any combination of dimensions counts as a new metric: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html#Dimension
We have been hit by this as well. We have been using aws-otel-collector
as a sidecar primarily for X-Ray tracing, however suddenly we started seeing metrics being exported and stored in CloudWatch Metrics as well (seems like default configuration of aws-otel-collector
container changed).
This resulted in quite a spike in CloudWatch costs.
We have to now update a bunch of services, and override the default configuration of aws-otel-collector
container. Any info on how to disable awsemfexporter
? Is it possible through some env variable on the sidecar container?
Our sidecars are enabled similar like this; https://github.com/aws-observability/aws-otel-collector/blob/21abee22f1e1c892eb6733f19b4ae66a6ca34055/examples/eks/aws-cloudwatch/otel-sidecar.yaml#L40
[UPDATE]
After further investigation, seems like the root cause may be that in newer versions of https://github.com/aws-observability/aws-otel-java-instrumentation, they added quite a bit of more instrumentation support for more metrics. I checked the history of the default config (https://github.com/aws-observability/aws-otel-collector/blob/main/config.yaml) file and it doesn't seem like there was any change recently to enable awsemfexporter
. It seems it was enabled already for quite some time.
To have fine control over the instrumentation that is enabled with the Java Agent, you can use the following system properties:
otel.instrumentation.common.default-enabled
- Control all the auto instrumentation. If set to false
, all instrumentation will be disabled by default and you will have to enable each instrumentation explicitly.
otel.instrumentation.[name].enabled
- Control if the instrumentation with name name
should be disabled or enabled. If not defined, the default value comes from otel.instrumentation.common.default-enabled
.
Therefore to have max control over the data that is generated through the agent, you can use -Dotel.instrumentation.common.default-enabled=false
and then enable each instrumentation of interested explicitly.
The list of names for the instrumentations is available here: https://opentelemetry.io/docs/instrumentation/java/automatic/agent-config/#suppressing-specific-agent-instrumentation
In case the list is not exhaustive, you can check all the available instrumentation here: https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation and check the name of the instrumentation.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
This issue was closed because it has been marked as stale for 30 days with no activity.