opentelemetry-lambda
opentelemetry-lambda copied to clipboard
OTEL Python does not always flush metrics to awsemf
Describe the bug OTEL Python Layer does not always flush metrics at the end of lambda invocation.
Steps to reproduce
- Deploy a lambda with the following python code: handler.py
"""Sample Lambda for testing"""
from opentelemetry.metrics import get_meter
from opentelemetry import trace
trace.get_tracer_provider()
tracer = trace.get_tracer(__name__)
meter = get_meter(__name__)
counter = meter.create_counter(name="invocation_counter", description="A counter metric", unit="invocations")
def lambda_handler(event, _):
"""Sample Lambda for testing"""
counter.add(1)
return {"status_code": 200}
config.yaml
#collector.yaml in the root directory
#Set an environemnt variable 'OPENTELEMETRY_COLLECTOR_CONFIG_FILE' to '/var/task/collector.yaml'
receivers:
otlp:
protocols:
grpc:
http:
exporters:
logging:
verbosity: detailed
awsxray:
awsemf:
namespace: ${env:OTEL_NAMESPACE}
dimension_rollup_option: 1
resource_to_telemetry_conversion:
enabled: false
service:
pipelines:
traces:
receivers: [otlp]
exporters: [awsxray]
metrics:
receivers: [otlp]
exporters: [logging,awsemf]
Ensure that the following configuration for the lambda is set:
- Environment -- AWS_LAMBDA_EXEC_WRAPPER: /opt/otel-instrument -- OPENTELEMETRY_COLLECTOR_CONFIG_FILE: /var/task/config.yaml -- OTEL_INSTRUMENTATION_AWS_LAMBDA_FLUSH_TIMEOUT: 900 -- OTEL_NAMESPACE: SampleNamespace -- OTEL_PROPAGATORS: xray -- OTEL_PYTHON_ID_GENERATOR: xray
- Runtime - 3.9
- Architecture - x86_64
- handler: handler.lambda_handler
- layers: arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python-amd64-ver-1-18-0:1
Ensure the lamdba has the following permissions:
- xray:PutTelemetryRecords
- xray:PutTraceSegments
- cloudwatch:GetMetricData
- cloudwatch:GetMetricStatistics
- cloudwatch:GetMetricStream
- cloudwatch:PutMetricData
- cloudwatch:PutMetricStream
- cloudwatch:StartMetricStreams
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
- Obtain the lambda arn
- Ensure that you are logged in to aws cli
- Create the following pytest and replace the lambda arn with the lamdba that was just created. test.py
"""
Tests the following Lambda by invoking the lambda 100 times and expecting the counter to return 100.
"""
import boto3
import json
from datetime import datetime
import time
def test_sample_lambda():
lambda_arn = "<insert lambda arn>"
lambda_client = boto3.client('lambda')
event = json.dumps({})
start_time = datetime.now()
for i in range(100):
response = lambda_client.invoke(
FunctionName=lambda_arn,
InvocationType='Event',
LogType='None',
Payload=event
)
assert response['StatusCode'] == 202
# Wait 2 minutes for metrics to propagate + wait for last lambda
time.sleep(2*60 + 2)
cloudwatch_client = boto3.client('cloudwatch')
metric_data = cloudwatch_client.get_metric_data(
MetricDataQueries = [
{
'Id': 'integration_test',
'MetricStat': {
'Metric': {
'Namespace': "SampleNamespace",
'MetricName': "invocation_counter",
'Dimensions': [{'Name': 'OTelLib', 'Value': 'handler'}]
},
'Period': 300,
'Stat': "Sum",
}
}
],
StartTime=start_time,
EndTime=datetime.now(),
)
otel_values = sum(metric_data['MetricDataResults'][0]['Values'])
assert otel_values == 100
ensure you have boto3 installed
- run pytest
What did you expect to see? There should be 100 values in cloudwatch. pytest should pass
What did you see instead? Less than 100 values sent to cloudwatch, sometimes 100 on warm lambdas and the test passes.
What version of collector/language SDK version did you use? arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python-amd64-ver-1-18-0:1
What language layer did you use? Python
Additional context I believe that sometimes the lambda layer does not flush emf metrics before the lambda freezes.
I do not see anything going to awsemf at all. I am able to see logs when using logging exporter with the same code.