dd-trace-java
dd-trace-java copied to clipboard
Support distributed trace propagation across SNS/SQS fanout
It seems the automatic propagation of X-Amzn-Trace-Id does not (always?) work and this seems to be an old issue given https://stackoverflow.com/questions/56377118/aws-x-ray-with-sqs-fanout-pattern
Setting other headers or the trace in the body may be a workaround but I am not sure how to handle spans on the receiver. I guess each gets a new one.
We have a quite simple setup with terraform managed SNS/SQS bridging and both consumer and producers are on the JVM (Java or Kotlin).
More background here for the concerned https://github.com/DataDog/dd-trace-java/issues/1823
Summarizing the current status from the discussion at the end of #1823
There's a limitation in AWS when it performs fanout from SNS to SQS where it doesn't propagate the
X-Amzn-Trace-Idheader. The current workaround suggested by AWS is to manually propagate the header in the message. This can be done by replacing the fanout done by AWS with a custom lambda that also propagates the header, or by adding the header as a message attribute before sending the message. We're looking at ways to workaround this limitation.
I noticed #4238 in the release notes of 1.1.0 ; does it contribute to this?
Hi @JeanFred there are a couple of fixes in 1.1.0 to help improve propagation when using v2 of the AWS SDK, there are also further improvements expected in the next release or so which address automatically associating receive requests when the trace details are only available in the message and not in the response.
Hi @mcculls, thanks for the answer! I’ll stay tuned for next release then. :)
It was quite difficult to workaround the problem with non-automatic trace propagation to SQS (see https://github.com/DataDog/dd-trace-java/issues/1823#issuecomment-1273590204). Here's my approach, based on https://docs.datadoghq.com/tracing/trace_collection/open_standards/java/#inject-and-extract-context-for-distributed-tracing.
var extractedSpanContext = GlobalTracer.get()
.extract(Builtin.TEXT_MAP, new SqsMessageTraceExtractAdapter(message));
var span = GlobalTracer.get()
.buildSpan(operationName)
.asChildOf(extractedSpanContext)
.start();
try (var scope = GlobalTracer.get().activateSpan(span)) {
operation.accept(message);
} finally {
span.finish();
}
And the SqsMessageTraceExtractAdapter:
public class SqsMessageTraceExtractAdapter implements TextMap {
private static final String CUSTOM_DATADOG_AWS_TRACE_HEADER = "AWSTraceHeader";
private static final String AWS_XRAY_TRACE_HEADER = "X-Amzn-Trace-Id";
private final Message message;
public SqsMessageTraceExtractAdapter(Message message) {
this.message = message;
}
@Override
public Iterator<Entry<String, String>> iterator() {
var awsTraceHeader = message.attributesAsStrings().get(CUSTOM_DATADOG_AWS_TRACE_HEADER);
return Map.of(AWS_XRAY_TRACE_HEADER, awsTraceHeader).entrySet().iterator();
}
@Override
public void put(String key, String value) {
throw new UnsupportedOperationException(
"This class should be used only with Tracer.extract()!");
}
}
Hi, we've released some SQS improvements in version 1.9.0 of the Java tracer
PR #4730 describes the updated behaviour along with example traces, as well as which switches to use if you want to go back to the old behaviour.
Closing as fixed in 1.9.0 with further improvements made in 1.13.0