dd-trace-java icon indicating copy to clipboard operation
dd-trace-java copied to clipboard

Support distributed trace propagation across SNS/SQS fanout

Open hjizettle opened this issue 3 years ago • 2 comments

It seems the automatic propagation of X-Amzn-Trace-Id does not (always?) work and this seems to be an old issue given https://stackoverflow.com/questions/56377118/aws-x-ray-with-sqs-fanout-pattern

Setting other headers or the trace in the body may be a workaround but I am not sure how to handle spans on the receiver. I guess each gets a new one.

We have a quite simple setup with terraform managed SNS/SQS bridging and both consumer and producers are on the JVM (Java or Kotlin).

hjizettle avatar Feb 24 '22 11:02 hjizettle

More background here for the concerned https://github.com/DataDog/dd-trace-java/issues/1823

hjizettle avatar Feb 24 '22 11:02 hjizettle

Summarizing the current status from the discussion at the end of #1823

There's a limitation in AWS when it performs fanout from SNS to SQS where it doesn't propagate the X-Amzn-Trace-Id header. The current workaround suggested by AWS is to manually propagate the header in the message. This can be done by replacing the fanout done by AWS with a custom lambda that also propagates the header, or by adding the header as a message attribute before sending the message. We're looking at ways to workaround this limitation.

mcculls avatar Feb 28 '22 09:02 mcculls

I noticed #4238 in the release notes of 1.1.0 ; does it contribute to this?

JeanFred avatar Dec 01 '22 11:12 JeanFred

Hi @JeanFred there are a couple of fixes in 1.1.0 to help improve propagation when using v2 of the AWS SDK, there are also further improvements expected in the next release or so which address automatically associating receive requests when the trace details are only available in the message and not in the response.

mcculls avatar Dec 01 '22 11:12 mcculls

Hi @mcculls, thanks for the answer! I’ll stay tuned for next release then. :)

JeanFred avatar Dec 01 '22 11:12 JeanFred

It was quite difficult to workaround the problem with non-automatic trace propagation to SQS (see https://github.com/DataDog/dd-trace-java/issues/1823#issuecomment-1273590204). Here's my approach, based on https://docs.datadoghq.com/tracing/trace_collection/open_standards/java/#inject-and-extract-context-for-distributed-tracing.

var extractedSpanContext = GlobalTracer.get()
  .extract(Builtin.TEXT_MAP, new SqsMessageTraceExtractAdapter(message));
var span = GlobalTracer.get()
  .buildSpan(operationName)
  .asChildOf(extractedSpanContext)
  .start();
try (var scope = GlobalTracer.get().activateSpan(span)) {
  operation.accept(message);
} finally {
  span.finish();
}

And the SqsMessageTraceExtractAdapter:

public class SqsMessageTraceExtractAdapter implements TextMap {

  private static final String CUSTOM_DATADOG_AWS_TRACE_HEADER = "AWSTraceHeader";
  private static final String AWS_XRAY_TRACE_HEADER = "X-Amzn-Trace-Id";
  private final Message message;

  public SqsMessageTraceExtractAdapter(Message message) {
    this.message = message;
  }

  @Override
  public Iterator<Entry<String, String>> iterator() {
    var awsTraceHeader = message.attributesAsStrings().get(CUSTOM_DATADOG_AWS_TRACE_HEADER);
    return Map.of(AWS_XRAY_TRACE_HEADER, awsTraceHeader).entrySet().iterator();
  }

  @Override
  public void put(String key, String value) {
    throw new UnsupportedOperationException(
        "This class should be used only with Tracer.extract()!");
  }
}

apptio-msobala avatar Dec 12 '22 17:12 apptio-msobala

Hi, we've released some SQS improvements in version 1.9.0 of the Java tracer

PR #4730 describes the updated behaviour along with example traces, as well as which switches to use if you want to go back to the old behaviour.

mcculls avatar Feb 23 '23 17:02 mcculls

Closing as fixed in 1.9.0 with further improvements made in 1.13.0

mcculls avatar May 15 '23 10:05 mcculls