opentelemetry-lambda icon indicating copy to clipboard operation
opentelemetry-lambda copied to clipboard

fix(nodejs): make aws-lambda and aws-sdk instrumentations respect OTEL_NODE_DISABLED_INSTRUMENTATIONS

Open DivMode opened this issue 2 months ago • 6 comments

Problem

DynamoDB Streams and other AWS event sources lack metadata carriers for trace propagation. Unlike SQS (message attributes) or API Gateway (HTTP headers), DynamoDB Streams only contain the changed record data.

The Trace Propagation Issue

When using OpenTelemetry with DynamoDB Streams:

  1. Producer writes to DynamoDB with trace context stored in record:

    {
      "data": "data",
      "_otel": {
        "traceparent": "00-7d7836d35ab7da657d35c2b5f889c3be-de0931d210e214b3-01"
      }
    }
    
  2. Consumer Lambda needs to extract and restore this context

  3. Problem: AwsLambdaInstrumentation creates automatic wrapper span with NEW trace ID before handler code runs:

    Trace A: 7d7836d35ab7da657d35c2b5f889c3be (original fetch)
    Trace B: 49792b11678269a91638c0717450480b (NEW trace - disconnected!)
    
  4. Result: Distributed trace is broken - cannot correlate stream processing with original operation

Solution

This PR extends the functionality added in the base implementation to make aws-lambda and aws-sdk instrumentations respect OTEL_NODE_DISABLED_INSTRUMENTATIONS.

Previously, PR #1653 added the ability to disable instrumentations via OTEL_NODE_DISABLED_INSTRUMENTATIONS, but the AWS Lambda and AWS SDK instrumentations were always loaded unconditionally in the createInstrumentations() function. This meant users couldn't disable these instrumentations even when setting the environment variable.

Usage:

export OTEL_NODE_DISABLED_INSTRUMENTATIONS=aws-lambda

Effect: Disables automatic Lambda wrapper span, allowing handler to manually restore trace context from event data before creating spans.

Use Case: DynamoDB Streams Trace Propagation

Producer (stores trace context):

await dynamodb.put({
  Item: {
    data: "data",
    _otel: {
      traceparent: currentTraceContext.traceparent
    }
  }
});

Consumer (restores trace context with disabled AwsLambdaInstrumentation):

// Lambda layer: OTEL_NODE_DISABLED_INSTRUMENTATIONS=aws-lambda
// No automatic wrapper span created!

export const handler = async (event: DynamoDBStreamEvent) => {
  const record = event.Records[0];
  const otelContext = unmarshall(record.dynamodb.NewImage)._otel;

  // Manually extract trace context
  const carrier = { traceparent: otelContext.traceparent };
  const extractedContext = propagation.extract(ROOT_CONTEXT, carrier);

  // Execute in extracted context (continues ORIGINAL trace!)
  await context.with(extractedContext, async () => {
    await OTELSpan.withSpan("process stream record", async () => {
      // All operations here inherit original trace ID
      await processRecord(record);
    });
  });
};

Result: Full trace continuity from producer → DynamoDB → stream → consumer

Related Issues

Event Sources Affected

This enables manual trace propagation for AWS event sources without metadata carriers:

Event Source Metadata Carrier Auto-Extracted Needs Manual Propagation
API Gateway ✅ HTTP headers ✅ YES ❌ NO
SQS ✅ Message attributes ✅ YES ❌ NO
SNS ✅ Message attributes ✅ YES ❌ NO
EventBridge ✅ Event metadata ✅ YES ❌ NO
DynamoDB Streams None ❌ NO YES (must store in record data)
Kinesis Streams None ❌ NO YES (must store in payload)

Industry Pattern

This approach aligns with industry best practices for DynamoDB Streams trace propagation:

  • Datadog (2024): "Distributed Tracing with Amazon DynamoDB" - Stores trace metadata in DynamoDB attributes, extracts in consumer
  • SigNoz (2024): "OpenTelemetry Context Propagation" - Manual inject/extract for message queues without auto-propagation
  • Better Stack (2023): "Implementing Distributed Tracing with OpenTelemetry" - Producer stores traceparent, consumer extracts and applies

Testing

  • [x] Verified aws-lambda and aws-sdk added to defaultInstrumentationList
  • [x] Verified conditional loading based on getActiveInstumentations()
  • [x] Tested in development environment with OTEL_NODE_DISABLED_INSTRUMENTATIONS=aws-lambda
  • [x] Confirmed trace propagation works for DynamoDB Streams in production

Benefits

  1. ✅ Enables distributed tracing across DynamoDB Streams (previously impossible)
  2. ✅ Provides control over when Lambda instrumentation runs (user choice)
  3. ✅ Maintains all other auto-instrumentations (AWS SDK, HTTP, net/TLS)
  4. ✅ No performance overhead (conditional loading, not runtime checks)
  5. ✅ Follows OpenTelemetry context propagation best practices
  6. ✅ Implements feature requested in issue #1803

DivMode avatar Nov 03 '25 06:11 DivMode

Hi @DivMode,

Could you please fix the lint issues.

serkan-ozal avatar Nov 06 '25 15:11 serkan-ozal

Hi @DivMode, in general this looks good to me. Thank you for your effort. I notice you've also made some changes to the MeterProvider config, but didn't mention the reasoning behind that change.

wpessers avatar Nov 20 '25 09:11 wpessers

@serkan-ozal we also need to consider, this is a breaking change. Because, if current nodejs layer users are already using the OTEL_NODE_ENABLED_INSTRUMENTATIONS, they are most likely not adding aws-sdk and aws-lambda in there. Because the current default behaviour, described in https://github.com/open-telemetry/opentelemetry-lambda/blob/main/nodejs/README.md?plain=1#L10-L11 states that these will always be loaded. This change will force these users to add aws-sdk and/or aws-lambda if they need that instrumentation.

wpessers avatar Nov 20 '25 09:11 wpessers

To me, the real problem here is the customization of the trace context propagation. So, instead of allowing users to disable AWS Lambda handler instrumentation, introducing (and/or documenting) a way to be able to customize trace context extraction might be better way.

serkan-ozal avatar Nov 28 '25 18:11 serkan-ozal

@DivMode I think @wpessers makes a good point. This is a breaking change and I think it is better having AWS Lambda and SDK instrumentations active even though they are not explicitly mentioned in the OTEL_NODE_ENABLED_INSTRUMENTATIONS config. They should be only disabled explicitly by OTEL_NODE_DISABLED_INSTRUMENTATIONS config.

serkan-ozal avatar Nov 28 '25 18:11 serkan-ozal

Good point on the customization of the propagation @serkan-ozal , I think it should be possible by customizing the lambda instrumentation eventContextExtractor: https://github.com/open-telemetry/opentelemetry-js-contrib/tree/main/packages/instrumentation-aws-lambda#aws-lambda-instrumentation-options In the nodejs layer we offer the globals, allowing you to configure some stuff yourself: https://github.com/open-telemetry/opentelemetry-lambda/blob/main/nodejs/packages/layer/src/wrapper.ts#L73-L104

I haven't played around with that but I believe if you use some sort of preload script that provides this function in the global scope and then use NODE_OPTIONS to run this before anything else, you can probably achieve what is needed.

wpessers avatar Nov 28 '25 18:11 wpessers