opentelemetry-lambda icon indicating copy to clipboard operation
opentelemetry-lambda copied to clipboard

OTEL Python does not always flush metrics to awsemf

Open sarwaan001 opened this issue 2 years ago • 2 comments

Describe the bug OTEL Python Layer does not always flush metrics at the end of lambda invocation.

Steps to reproduce

  1. Deploy a lambda with the following python code: handler.py
"""Sample Lambda for testing"""
from opentelemetry.metrics import get_meter
from opentelemetry import trace

trace.get_tracer_provider()
tracer = trace.get_tracer(__name__)

meter = get_meter(__name__)

counter = meter.create_counter(name="invocation_counter", description="A counter metric", unit="invocations")


def lambda_handler(event, _):
    """Sample Lambda for testing"""
    counter.add(1)
    return {"status_code": 200}

config.yaml

#collector.yaml in the root directory
#Set an environemnt variable 'OPENTELEMETRY_COLLECTOR_CONFIG_FILE' to '/var/task/collector.yaml'

receivers:
  otlp:
    protocols:
      grpc:
      http:
exporters:
  logging:
    verbosity: detailed
  awsxray:
  awsemf:
    namespace: ${env:OTEL_NAMESPACE}
    dimension_rollup_option: 1
    resource_to_telemetry_conversion:
      enabled: false
service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [awsxray]
    metrics:
      receivers: [otlp]
      exporters: [logging,awsemf]

Ensure that the following configuration for the lambda is set:

  • Environment -- AWS_LAMBDA_EXEC_WRAPPER: /opt/otel-instrument -- OPENTELEMETRY_COLLECTOR_CONFIG_FILE: /var/task/config.yaml -- OTEL_INSTRUMENTATION_AWS_LAMBDA_FLUSH_TIMEOUT: 900 -- OTEL_NAMESPACE: SampleNamespace -- OTEL_PROPAGATORS: xray -- OTEL_PYTHON_ID_GENERATOR: xray
  • Runtime - 3.9
  • Architecture - x86_64
  • handler: handler.lambda_handler
  • layers: arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python-amd64-ver-1-18-0:1

Ensure the lamdba has the following permissions:

  • xray:PutTelemetryRecords
  • xray:PutTraceSegments
  • cloudwatch:GetMetricData
  • cloudwatch:GetMetricStatistics
  • cloudwatch:GetMetricStream
  • cloudwatch:PutMetricData
  • cloudwatch:PutMetricStream
  • cloudwatch:StartMetricStreams
  • logs:CreateLogGroup
  • logs:CreateLogStream
  • logs:PutLogEvents
  1. Obtain the lambda arn
  2. Ensure that you are logged in to aws cli
  3. Create the following pytest and replace the lambda arn with the lamdba that was just created. test.py
"""
    Tests the following Lambda by invoking the lambda 100 times and expecting the counter to return 100.
"""
import boto3
import json
from datetime import datetime
import time
def test_sample_lambda():
    lambda_arn = "<insert lambda arn>"

    lambda_client = boto3.client('lambda')
    event = json.dumps({})

    start_time = datetime.now()

    for i in range(100):
        response = lambda_client.invoke(
            FunctionName=lambda_arn,
            InvocationType='Event',
            LogType='None',
            Payload=event
        )
        assert response['StatusCode'] == 202
    
    # Wait 2 minutes for metrics to propagate + wait for last lambda
    time.sleep(2*60 + 2)

    cloudwatch_client = boto3.client('cloudwatch')

    metric_data = cloudwatch_client.get_metric_data(
        MetricDataQueries = [
            {
                'Id': 'integration_test',
                'MetricStat': {
                    'Metric': {
                        'Namespace': "SampleNamespace",
                        'MetricName': "invocation_counter",
                        'Dimensions': [{'Name': 'OTelLib', 'Value': 'handler'}]
                    },
                    'Period': 300,
                    'Stat': "Sum",
                }
            }
        ],
        StartTime=start_time,
        EndTime=datetime.now(),
    )

    otel_values = sum(metric_data['MetricDataResults'][0]['Values'])

    assert otel_values == 100

ensure you have boto3 installed

  1. run pytest

What did you expect to see? There should be 100 values in cloudwatch. pytest should pass

What did you see instead? Less than 100 values sent to cloudwatch, sometimes 100 on warm lambdas and the test passes.

What version of collector/language SDK version did you use? arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python-amd64-ver-1-18-0:1

What language layer did you use? Python

Additional context I believe that sometimes the lambda layer does not flush emf metrics before the lambda freezes.

sarwaan001 avatar Aug 15 '23 21:08 sarwaan001

I do not see anything going to awsemf at all. I am able to see logs when using logging exporter with the same code.

stevemao avatar Feb 03 '24 06:02 stevemao