dd-trace-java Support distributed trace propagation across SNS/SQS fanout

It seems the automatic propagation of X-Amzn-Trace-Id does not (always?) work and this seems to be an old issue given https://stackoverflow.com/questions/56377118/aws-x-ray-with-sqs-fanout-pattern

Setting other headers or the trace in the body may be a workaround but I am not sure how to handle spans on the receiver. I guess each gets a new one.

We have a quite simple setup with terraform managed SNS/SQS bridging and both consumer and producers are on the JVM (Java or Kotlin).

Feb 24 '22 11:02 hjizettle

More background here for the concerned https://github.com/DataDog/dd-trace-java/issues/1823

Feb 24 '22 11:02 hjizettle

Summarizing the current status from the discussion at the end of #1823

There's a limitation in AWS when it performs fanout from SNS to SQS where it doesn't propagate the X-Amzn-Trace-Id header. The current workaround suggested by AWS is to manually propagate the header in the message. This can be done by replacing the fanout done by AWS with a custom lambda that also propagates the header, or by adding the header as a message attribute before sending the message. We're looking at ways to workaround this limitation.

Feb 28 '22 09:02 mcculls

I noticed #4238 in the release notes of 1.1.0 ; does it contribute to this?

Dec 01 '22 11:12 JeanFred

Hi @JeanFred there are a couple of fixes in 1.1.0 to help improve propagation when using v2 of the AWS SDK, there are also further improvements expected in the next release or so which address automatically associating receive requests when the trace details are only available in the message and not in the response.

Dec 01 '22 11:12 mcculls

Hi @mcculls, thanks for the answer! I’ll stay tuned for next release then. :)

Dec 01 '22 11:12 JeanFred

It was quite difficult to workaround the problem with non-automatic trace propagation to SQS (see https://github.com/DataDog/dd-trace-java/issues/1823#issuecomment-1273590204). Here's my approach, based on https://docs.datadoghq.com/tracing/trace_collection/open_standards/java/#inject-and-extract-context-for-distributed-tracing.

var extractedSpanContext = GlobalTracer.get()
  .extract(Builtin.TEXT_MAP, new SqsMessageTraceExtractAdapter(message));
var span = GlobalTracer.get()
  .buildSpan(operationName)
  .asChildOf(extractedSpanContext)
  .start();
try (var scope = GlobalTracer.get().activateSpan(span)) {
  operation.accept(message);
} finally {
  span.finish();
}

And the SqsMessageTraceExtractAdapter:

public class SqsMessageTraceExtractAdapter implements TextMap {

  private static final String CUSTOM_DATADOG_AWS_TRACE_HEADER = "AWSTraceHeader";
  private static final String AWS_XRAY_TRACE_HEADER = "X-Amzn-Trace-Id";
  private final Message message;

  public SqsMessageTraceExtractAdapter(Message message) {
    this.message = message;
  }

  @Override
  public Iterator<Entry<String, String>> iterator() {
    var awsTraceHeader = message.attributesAsStrings().get(CUSTOM_DATADOG_AWS_TRACE_HEADER);
    return Map.of(AWS_XRAY_TRACE_HEADER, awsTraceHeader).entrySet().iterator();
  }

  @Override
  public void put(String key, String value) {
    throw new UnsupportedOperationException(
        "This class should be used only with Tracer.extract()!");
  }
}

Dec 12 '22 17:12 apptio-msobala

Hi, we've released some SQS improvements in version 1.9.0 of the Java tracer

PR #4730 describes the updated behaviour along with example traces, as well as which switches to use if you want to go back to the old behaviour.

Feb 23 '23 17:02 mcculls

Closing as fixed in 1.9.0 with further improvements made in 1.13.0

May 15 '23 10:05 mcculls

dd-trace-java dd-trace-java copied to clipboard

Support distributed trace propagation across SNS/SQS fanout

dd-trace-java
dd-trace-java copied to clipboard