OpenTelemetry Lambda Layers in Container Image do not export spans from instrumentation to X-Ray/CloudWatch
Describe the bug
I am trying to use the OpenTelemetry Lambda layers in a Lambda Container image.
When I invoke the Lambda function in AWS, the function executes successfully, however, I do not see any of the spans from the instrumentation of the Lambda. It appears that the Collector extension is not exporting span data to X-Ray or any other backend.
Steps to reproduce
I have created a repository to reproduce the issue at https://github.com/gotgenes/lambda-opentelemetry-docker.
The repository includes the following:
- A Node.js Lambda function implemented in TypeScript.
- A Dockerfile to build the Lambda container image from the Node.js v22 Lambda base image with the OpenTelemetry Lambda layers.
- A Docker Compose file to run the Lambda container locally with Docker, along with an otel-tui sidecar container to view the telemetry.
- A CDK app to create the ECR repository and deploy the Lambda function.
The steps to reproduce the issue are as follows:
-
Clone the repository.
-
Set the environment variables:
export AWS_PROFILE=$YOUR_PROFILE export COMPOSE_BAKE=true -
Log in to AWS CLI:
npm run login -
Create the ECR repository:
npm run deploy-ecr -
Build and push the Docker image to ECR, and deploy the Lambda function:
npm run build-publish-deploy -
Invoke the Lambda function, using the Function URL output during the CDK deployment:
curl -XPOST -i --json '{}' $LAMBDA_FUNCTION_URL -
Observe the trace for the Lambda function in CloudWatch has no spans for the
fetchcall made in the Lambda function.
Please see the repository's README for detailed instructions.
What did you expect to see?
I expected to see the spans from the OpenTelemetry instrumentation of the Lambda function in the trace in CloudWatch.
Specifically, I expected to see a span named GET from the fetch call made in the Lambda function, which would indicate that the OpenTelemetry Collector extension is correctly exporting spans from the auto-instrumentation to X-Ray. Such a span would look like the following example from otel-tui:
What did you see instead?
The only spans I see in the trace in CloudWatch are the ones from the Lambda function initialization, invocation, and overhead. None of these seem to contain data internal to the Lambda, itself, but part of the Lambda runtime machinery:
What version of collector/language SDK version did you use?
- Collector extension layer version:
v0.15.0 - Node.js layer version:
v0.14.0
What language layer did you use?
JavaScript/Node.js (implemented in TypeScript)
Additional context
While I appreciate this project providing extension layers for the collector and language SDKs that can be accessed for ZIP file distributions, it seems to me that AWS wants to push users towards using container images for Lambda functions. Therefore, I think it would be beneficial to support the OpenTelemetry Lambda layers in container images as well. That might look like also providing base images with the OpenTelemetry Lambda layers included, or at least documenting how to use the layers in a custom Dockerfile. The purpose of my repository is to demonstrate how to do that, but it currently does not work as expected.
Can you share the collector config that is deployed in the lambda? The repository is having a config but it is pointing to the local oteltui endpoint. You need to have an exporter pointing to the xray OTLP endpoint using the sigv4 extension and a IAM policy to allow writing to xRay.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-OTLPEndpoint.html
or use the xRay exporter. This exporter is not part of the collector from this repository. So you would have to use the ADOT collector https://github.com/aws-observability/aws-otel-lambda https://aws-otel.github.io/docs/getting-started/lambda
I would expect to see Registering OpenTelemetry logs in the beginning, but it wasn't there. So, I suspect that OTEL Lambda Nodejs layer is not activated at all. As far as I know, AWS_LAMBDA_EXEC_WRAPPER is supported in AWS Lambda base Docker images, but not 100% sure though.
- Can you try by setting
OTEL_LOG_LEVEL=DEBUGenv var and share the CloudWatch logs? - Does this issue (missing spans) happen only on local, on AWS Lambda env, or both?
Replying to @RaphaelManke:
Can you share the collector config that is deployed in the lambda?
I'm not providing any config with the Lambda in AWS, so I assume it uses the config.yaml in the collector here.
The repository is having a config but it is pointing to the local oteltui endpoint.
Correct, the one in the lambda-opentelemetry-docker repository is only used for trying to get the collector extension to work locally (see #1850).
You need to have an exporter pointing to the xray OTLP endpoint using the sigv4 extension and a IAM policy to allow writing to xRay.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-OTLPEndpoint.html
or use the xRay exporter. This exporter is not part of the collector from this repository. So you would have to use the ADOT collector https://github.com/aws-observability/aws-otel-lambda https://aws-otel.github.io/docs/getting-started/lambda
I think this sheds light on a critical error in my understanding. I assumed that the collector extension, by default, has and uses an exporter for CloudWatch or X-Ray. This seems not to be the case. I think it's very helpful of you to point out that a key difference of the ADOT collector extension is that it should have the X-Ray exporter.
Am I correct that the sigv4 extension ships with the OpenTelemetry Lambda's collector extension? I think this might be it:
https://github.com/open-telemetry/opentelemetry-lambda/blob/9030b49002d51a5ecf31d3897ec724753710faec/collector/lambdacomponents/extension/sigv4auth.go
So I would want to create a custom configuration file, ensure that the sigv4 extension is listed in the service.extensions, and have an otlp exporter pointed at https://xray.<AWS Region>.amazonaws.com/v1/traces, and then point to the custom config with the OPENTELEMETRY_COLLECTOR_CONFIG_URI environment variable?
I also found I had to enable CloudWatch Transaction Search. I ended up using the AWS Console because I don't yet know how to programatically enable this through CDK (if that's even possible).
After those steps, I now see three distinct traces for each invocation:
~None of these traces have the Initialization, Invocation, or Overhead spans shown in the original post's "What did you see instead?"~ [Update: This is incorrect, the first trace (shown at the bottom) has these three spans.] My best guess is this is failed context propagation, but I'm very unsure. Is this related to #1742?
Also, the top two traces look nearly indistinguishable. I don't yet understand this duplication.
I also set OTEL_LOG_LEVEL to DEBUG via a Lambda environment variable, as suggested by @serkan-ozal, and invoked the lambda. I have attached the CloudWatch logs of the invocation.