dd-trace-java icon indicating copy to clipboard operation
dd-trace-java copied to clipboard

Add aws sns instrumentation for AWS lambda

Open joeyzhao2018 opened this issue 10 months ago • 2 comments

What Does This Do

Add AWS SNS Instrumentation for AWS Lambda. (Currently the trace is not connected even though the basic sdk instrumentation adds the X-Amzn-Trace-Id header in the AWS Request)

Implementation

  • Followed existing patterns for extra AWS SQS instrumentation needed for Datastream Monitoring. i.e. use ExecutionInterceptor and add the tracecontext into MessageAttributes per message.
  • Inject a _datadog messageAttribute to pass on the trace context. The messageAttribute use binary value. Therefore the implementation first converts the tracecontext into a json string then convert it to byteBuffer.

Motivation

  • High priority customer ticket APMS-11602
  • Feature Parity. The same "messageAttribute-propagated-tracecontext" instrumentations have been done in dd-trace-py, dd-trace-js and dd-trace-dotnet.
  • dd-trace-java itself also honors the tracecontext propagated via messageAttributes.

Testing

  • Added unittests, which uses localstack to emulate SNS service.
  • End-to-end integration tests would be added to Serverless team's integration test suite
  • Manual tests sdk v1 single message Screenshot 2024-05-20 at 11 06 22 AM sdk v2 single message Screenshot 2024-05-20 at 2 54 21 PM sdk v1 batch messages Screenshot 2024-05-20 at 3 14 21 PM sdk v2 batch messages Screenshot 2024-05-20 at 3 14 00 PM

Some Considerations

  • Why SQS works without this special injection?
    • AWS automatically provide AWSTraceHeader in SQS message's attributes (NOTE: NOT MessageAttributes. Also note that this is not a result from ReceiveMessage calls.) Screenshot 2024-05-02 at 12 14 53 AM
    • Btw we only added java => SQS => AWS Lambda support quite recently python pr nodejs PR
  • Does SNS has some configs or parameters similar to ReceiveMessage requests' getAttributeNames we can use in SNS case?
    • 😮 Actually, the x-ray trace id is propagated in SNS case. And I have a PR on the consumer side (nodejs) https://github.com/DataDog/datadog-lambda-js/pull/538, which uses the x-ray trace id from the env to get the trace id. However, the parent id is not correct. It's likely a 'would-be-created' x-ray span id. So the parenting would be wrong. Therefore, this PR is still needed. This PR will also work without changing any consumers. (As already seen in the screenshots above)

Jira ticket: https://datadoghq.atlassian.net/browse/SVLS-4780

joeyzhao2018 avatar Apr 12 '24 20:04 joeyzhao2018