aws-otel-collector icon indicating copy to clipboard operation
aws-otel-collector copied to clipboard

High usage of AWS Cloudwatch Metrics after enabling Default AWSOTEL

Open ggallotti opened this issue 2 years ago • 4 comments

After enabling the AWS OTEL Collector in 4 EC2 Linux Tomcat WebServers, with all default configuration, the AWS CloudWatch bill regarding "metric-month" is $250 in 3 days with more than 800.000 metrics.

image

Why there are so many metrics being sent to Cloudwatch? How can we lower this number?

All the configuration is by default. AWS Xray sampling is default (1req/s, 5%).

Environment Default AWS Otel Collector Configuration. (https://aws-otel-collector.s3.amazonaws.com/amazon_linux/amd64/latest/aws-otel-collector.rpm) AWS EC2 Linux2 AMI.

ggallotti avatar Oct 11 '22 13:10 ggallotti

Hi @ggallotti. Can you help me further debug your problem?

  • What installation method did you use to install the collector in your instance? Was this one? https://aws-otel.github.io/docs/setup/ec2#install-aws-distro-for-opentelemetry-collector-on-ec2-instance-using-cloudformation (This is relevant for finding out what defaults are being used in your case).

  • Are you using Java Auto instrumentation with your Tomcat Application?

  • Can you provide more details about the metrics that you are seeing? Are they related to Tomcat or to your application?

Finally, you can change the collector configuration to only enable specific metrics using the metric_declaration property in the awsemfexporter. (this is what is used to create the metrics collected from your application to CloudWatch).

rapphil avatar Oct 11 '22 15:10 rapphil

If you have enterprise support, can you please submit a ticket?

rapphil avatar Oct 11 '22 15:10 rapphil

Hi @rapphil and thanks for the Response. Sure, I will upload a Ticket. Just in case I'm completing the missing information. I would try to change the configuration of the awsemfexporter, thanks for the tip.

The WebApplications is accessed by thousands of users. Maybe the metrics are proportionally to the RateLimit of XRay configuration. I don't know if the Ratelimit applies to Metrics.

Collector Installation (inside same EC2 where Tomcat is running):

wget https://aws-otel-collector.s3.amazonaws.com/amazon_linux/amd64/latest/aws-otel-collector.rpm
rpm -Uvh  ./aws-otel-collector.rpm
/opt/aws/aws-otel-collector/bin/aws-otel-collector-ctl -a start

Instrumentation Yes, Java Tomcat Auto instrumentation. (Everything by default)

wget https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar -O /opt/aws-opentelemetry-agent.jar
export CATALINA_OPTS="$CATALINA_OPTS -javaagent:/opt/aws-opentelemetry-agent.jar"

AWS CloudWatch Logs (/metrics/ServiceNamespace/Name) image

There are thousands of:

{
    "OTelLib": "io.opentelemetry.tomcat-7.0",
    "_aws": {
        "CloudWatchMetrics": [
            {
                "Namespace": "com.namespace/AppName",
                "Dimensions": [
                    [
                        "http.status_code",
                        "OTelLib",
                        "http.flavor",
                        "http.host",
                        "http.method",
                        "http.route",
                        "http.scheme"
                    ],
                    [
                        "OTelLib"
                    ],
                    [
                        "OTelLib",
                        "http.flavor"
                    ],
                    [
                        "OTelLib",
                        "http.host"
                    ],
                    [
                        "OTelLib",
                        "http.method"
                    ],
                    [
                        "OTelLib",
                        "http.route"
                    ],
                    [
                        "OTelLib",
                        "http.scheme"
                    ],
                    [
                        "OTelLib",
                        "http.status_code"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "http.server.duration",
                        "Unit": "Milliseconds"
                    }
                ]
            }
        ],
        "Timestamp": 1665493994020
    },
    "http.flavor": "1.1",
    "http.host": "hostname.com",
    "http.method": "GET",
    "http.route": "/v1/rest/FXTournamentOverview_Level_Detail",
    "http.scheme": "http",
    "http.server.duration": {
        "Max": 1.483377,
        "Min": 0.770689,
        "Count": 54,
        "Sum": 53.263889
    },
    "http.status_code": "304"
}

ggallotti avatar Oct 12 '22 14:10 ggallotti

Based on your comment, have you considered reducing the number of dimensions, in case you are not using all of them? Keep in mind that any combination of dimensions counts as a new metric: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html#Dimension

rapphil avatar Oct 12 '22 23:10 rapphil

We have been hit by this as well. We have been using aws-otel-collector as a sidecar primarily for X-Ray tracing, however suddenly we started seeing metrics being exported and stored in CloudWatch Metrics as well (seems like default configuration of aws-otel-collector container changed).

This resulted in quite a spike in CloudWatch costs.

We have to now update a bunch of services, and override the default configuration of aws-otel-collector container. Any info on how to disable awsemfexporter? Is it possible through some env variable on the sidecar container?

Our sidecars are enabled similar like this; https://github.com/aws-observability/aws-otel-collector/blob/21abee22f1e1c892eb6733f19b4ae66a6ca34055/examples/eks/aws-cloudwatch/otel-sidecar.yaml#L40

[UPDATE]

After further investigation, seems like the root cause may be that in newer versions of https://github.com/aws-observability/aws-otel-java-instrumentation, they added quite a bit of more instrumentation support for more metrics. I checked the history of the default config (https://github.com/aws-observability/aws-otel-collector/blob/main/config.yaml) file and it doesn't seem like there was any change recently to enable awsemfexporter. It seems it was enabled already for quite some time.

ertanden avatar Nov 29 '22 13:11 ertanden

To have fine control over the instrumentation that is enabled with the Java Agent, you can use the following system properties:

otel.instrumentation.common.default-enabled - Control all the auto instrumentation. If set to false, all instrumentation will be disabled by default and you will have to enable each instrumentation explicitly. otel.instrumentation.[name].enabled - Control if the instrumentation with name name should be disabled or enabled. If not defined, the default value comes from otel.instrumentation.common.default-enabled.

Therefore to have max control over the data that is generated through the agent, you can use -Dotel.instrumentation.common.default-enabled=false and then enable each instrumentation of interested explicitly.

The list of names for the instrumentations is available here: https://opentelemetry.io/docs/instrumentation/java/automatic/agent-config/#suppressing-specific-agent-instrumentation

In case the list is not exhaustive, you can check all the available instrumentation here: https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation and check the name of the instrumentation.

rapphil avatar Dec 09 '22 20:12 rapphil

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Feb 12 '23 20:02 github-actions[bot]

This issue was closed because it has been marked as stale for 30 days with no activity.

github-actions[bot] avatar Mar 19 '23 20:03 github-actions[bot]