contrib/aws/aws-sdk-go: tricky to pass trace context through to Lambda functions
Background
I have a system that I am trying to instrument using DataDog APM. The two main components are a long-lived Docker container (running on AWS ECS Fargate) and an Lambda function invoked by that container. Specifically, it is invoked using the direct Lambda Invoke API, not via an intermediate AWS API Gateway.
Problem
Out of the box, APM traces aren't propagated from the app in the Docker container to the Lambda function. A span is created, just not propagated:
https://github.com/DataDog/dd-trace-go/blob/d4e383244f56b8d51ae87d7e114d377170027653/contrib/aws/aws-sdk-go/aws/aws.go#L54-L73
Attempted workaround
I have attempted to work around this with the following code, but it's not ideal:
sess = ddaws.WrapSession(sess)
sess.Handlers.Build.PushFront(func(request *request.Request) {
input, ok := request.Params.(*lambda.InvokeInput)
if !ok {
return
}
span, ok := tracer.SpanFromContext(request.Context())
if !ok {
return
}
mss := map[string]string{}
err := tracer.Inject(span.Context(), tracer.TextMapCarrier(mss))
if err != nil {
return
}
b, _ := json.Marshal(map[string]interface{}{"custom": mss})
input.ClientContext = aws.String(base64.StdEncoding.EncodeToString(b))
})
Specifically, the enqueuer span (my Lambda function) really should be a child span of the aws.lambda span, rather than a sibling.
Proposed solution
As a user of this package, I can't push my handler into the Send stack - because by that point the request has already been signed and my alteration of the payload would invalidate the signature. If this line was changed to s.Handlers.Build.PushFrontNamed then my code would work.
This would also be equally applicable to other handlers where passing span context through would be useful (e.g. off the top of my head: Step Functions, CodeBuild, EventBridge, so on)
https://github.com/DataDog/dd-trace-go/blob/d4e383244f56b8d51ae87d7e114d377170027653/contrib/aws/aws-sdk-go/aws/aws.go#L43
Thanks for the proposal @aidansteele. I'll leave it to team @DataDog/tracing-go to comment, but feel free to ping me if this gets stuck for some reason.
@aidansteele I've recorded your request, we will try to get back to you with a solution promptly 🙌
I've been running into this same problem while trying to propagate. It would be nice to move the span initialization to the build phase as @aidansteele suggests, so the custom downstream extracted span could be a child aws call.
My current workaround, to avoid the dangling aws.lambda operation, is to just remove this span from being created for requests of this operation type (and others service/operations I want to propagate to). I added a custom handler to the front of the Build handlers, with a function similar to:
const (
ddSendHandlerName = "gopkg.in/DataDog/dd-trace-go.v1/contrib/aws/aws-sdk-go/aws/handlers.Send"
ddCompleteHandlerName = "gopkg.in/DataDog/dd-trace-go.v1/contrib/aws/aws-sdk-go/aws/handlers.Complete"
)
// RemoveDatadogHandlersForDuplicateReqs checks if the request
// is of type we do not want duplicate spans for since we
// trace through manually.
func RemoveDatadogHandlersForDuplicateReqs(req *request.Request) {
duplicateOperations := map[string]bool{"Invoke": true, "SendMessage": true, "PublishMessage": true}
duplicateServices := map[string]bool{"lambda": true, "sqs": true, "sns":true}
_, servicePresent := duplicateServices[req.ClientInfo.ServiceName]
_, operationPresent := duplicateOperations[req.Operation.Name]
if servicePresent && operationPresent {
zap.L().Debug("removing duplicate aws span in favor of manually traced request",
zap.String("aws.service", req.ClientInfo.ServiceName),
zap.String("aws.operation", req.Operation.Name))
req.Handlers.Send.RemoveByName(ddSendHandlerName)
req.Handlers.Complete.RemoveByName(ddCompleteHandlerName)
}
}
I also considered using req.Handlers.Send/Complete.SwapByName to swap these with a no-op handler, since I was worried of unknowingly breaking requests by removing them.
It would be nice if the Datadog handler names were not hardcoded so I could reference them directly from contrib, because it feels like I'm setting myself up for an error down the line if they change. I am happy to open a PR to expose them if this seems acceptable.
Closed due to merged PR and no follow-up questions. If there are related problems, please open a separate issue.