dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

contrib/aws/aws-sdk-go: tricky to pass trace context through to Lambda functions

Open aidansteele opened this issue 3 years ago • 2 comments

Background

I have a system that I am trying to instrument using DataDog APM. The two main components are a long-lived Docker container (running on AWS ECS Fargate) and an Lambda function invoked by that container. Specifically, it is invoked using the direct Lambda Invoke API, not via an intermediate AWS API Gateway.

Problem

Out of the box, APM traces aren't propagated from the app in the Docker container to the Lambda function. A span is created, just not propagated:

https://github.com/DataDog/dd-trace-go/blob/d4e383244f56b8d51ae87d7e114d377170027653/contrib/aws/aws-sdk-go/aws/aws.go#L54-L73

Attempted workaround

I have attempted to work around this with the following code, but it's not ideal:

sess = ddaws.WrapSession(sess)
sess.Handlers.Build.PushFront(func(request *request.Request) {
	input, ok := request.Params.(*lambda.InvokeInput)
	if !ok {
		return
	}

	span, ok := tracer.SpanFromContext(request.Context())
	if !ok {
		return
	}

	mss := map[string]string{}
	err := tracer.Inject(span.Context(), tracer.TextMapCarrier(mss))
	if err != nil {
		return
	}

	b, _ := json.Marshal(map[string]interface{}{"custom": mss})
	input.ClientContext = aws.String(base64.StdEncoding.EncodeToString(b))
})
CleanShot 2022-02-15 at 13 32 47@2x

Specifically, the enqueuer span (my Lambda function) really should be a child span of the aws.lambda span, rather than a sibling.

Proposed solution

As a user of this package, I can't push my handler into the Send stack - because by that point the request has already been signed and my alteration of the payload would invalidate the signature. If this line was changed to s.Handlers.Build.PushFrontNamed then my code would work.

This would also be equally applicable to other handlers where passing span context through would be useful (e.g. off the top of my head: Step Functions, CodeBuild, EventBridge, so on)

https://github.com/DataDog/dd-trace-go/blob/d4e383244f56b8d51ae87d7e114d377170027653/contrib/aws/aws-sdk-go/aws/aws.go#L43

aidansteele avatar Feb 15 '22 02:02 aidansteele

Thanks for the proposal @aidansteele. I'll leave it to team @DataDog/tracing-go to comment, but feel free to ping me if this gets stuck for some reason.

felixge avatar Feb 16 '22 13:02 felixge

@aidansteele I've recorded your request, we will try to get back to you with a solution promptly 🙌

dianashevchenko avatar Mar 28 '22 13:03 dianashevchenko

I've been running into this same problem while trying to propagate. It would be nice to move the span initialization to the build phase as @aidansteele suggests, so the custom downstream extracted span could be a child aws call.

My current workaround, to avoid the dangling aws.lambda operation, is to just remove this span from being created for requests of this operation type (and others service/operations I want to propagate to). I added a custom handler to the front of the Build handlers, with a function similar to:

const (
	ddSendHandlerName                  = "gopkg.in/DataDog/dd-trace-go.v1/contrib/aws/aws-sdk-go/aws/handlers.Send"
	ddCompleteHandlerName              = "gopkg.in/DataDog/dd-trace-go.v1/contrib/aws/aws-sdk-go/aws/handlers.Complete"
)

// RemoveDatadogHandlersForDuplicateReqs checks if the request
// is of type we do not want duplicate spans for since we
// trace through manually.
func RemoveDatadogHandlersForDuplicateReqs(req *request.Request) {
	duplicateOperations := map[string]bool{"Invoke": true, "SendMessage": true, "PublishMessage": true}
	duplicateServices := map[string]bool{"lambda": true, "sqs": true, "sns":true}
	_, servicePresent := duplicateServices[req.ClientInfo.ServiceName]
	_, operationPresent := duplicateOperations[req.Operation.Name]

	if servicePresent && operationPresent {
		zap.L().Debug("removing duplicate aws span in favor of manually traced request",
			zap.String("aws.service", req.ClientInfo.ServiceName),
			zap.String("aws.operation", req.Operation.Name))

		req.Handlers.Send.RemoveByName(ddSendHandlerName)
		req.Handlers.Complete.RemoveByName(ddCompleteHandlerName)
	}
}

I also considered using req.Handlers.Send/Complete.SwapByName to swap these with a no-op handler, since I was worried of unknowingly breaking requests by removing them.

It would be nice if the Datadog handler names were not hardcoded so I could reference them directly from contrib, because it feels like I'm setting myself up for an error down the line if they change. I am happy to open a PR to expose them if this seems acceptable.

mstumpfx avatar Oct 19 '22 15:10 mstumpfx

Closed due to merged PR and no follow-up questions. If there are related problems, please open a separate issue.

zarirhamza avatar Mar 16 '23 15:03 zarirhamza