Unexpected link between spans for HTTP requests uploading files to Amazon S3
Context
We have an application that has the following flow:
- The user sends a request that creates a
submissiontoPOST /submissions - The server creates said submission and returns a
submission_idto the FE and puts it into the context of the user - The user proceeds to complete a form that handles multiple multipart form file upload inputs
- Each file upload executes a
POST /documents/uploadthat sends thesubmission_idalong with it - Finally, the user clicks on a
Submitbutton which executes aPATCH /submissions/:id - The form completes and the user is redirected to another page
The app is a UI with a BFF that calls other services in turn.
Expected DataDog behavior
In this flow, my expectation is that each request sends a trace to DataDog, e.g -> one POST /documents/upload generates one trace.
Actual behavior
The traces are only sent to DataDog once the final PATCH /submissions/:id executes, and they are sent under the same trace_id, making the APM trace view extremely big and sometimes incomplete.
The problem
This behavior makes debugging in some cases very complex, since the big traces are sent incomplete sometimes, or if the user sends more than 10 files is already impossible to read the traces are at a glance.
This is also very confusing for other developers who are used to the 1:1 approach of request <> trace.
I believe this is something either in my application layer that Orchestrion is hooking into, or a bug in Orchestrion itself.
It seems like the requests are being "buffered" and only sent to DataDog once the final PATCH gets executed. There is no explicit configuration of DataDog in my application to do this. I'm almost entirely relying on default configuration of Orchestrion.
The app follows an standard dependency injection approach with nothing that stands out.
Things I have tried / researched
- Removing all unneeded integrations
- Check if FE requests were getting auto injected with tracing headers a. The FE is not injecting the headers. I even deployed a temporary middleware to prune all datadog headers coming from the FE and it made no difference
- Cleaning up spans
- Upgrading orchestrion to latest version
- I made an env variable change to the deployment to allow for partial span flushes, which helped a little bit with the issue of spans not reaching DataDog at all
The only link I see between the linked requests is the
submission_id, but I have no idea how is auto hooking into that to link these distributed traces. I also get the bannerThis trace is linked to other traces, which to me is not standard in none of our services, even if they go from service A to service B.
I have attached to this issue evidence of the problem, and a .csv that contains logs to prove my point even more. The logs have the same trace_id, but different correlation_ids, which is expected since I create the correlation_ids per request!
Can you guys help me out? How could orchestrion be making these correlations?
Versions
orchestrion v1.5.0
github.com/DataDog/dd-trace-go/v2 v2.1.0
orchestrion.tool.go:
// This file was created by `orchestrion pin`, and is used to ensure the
// `go.mod` file contains the necessary entries to ensure repeatable builds when
// using `orchestrion`. It is also used to set up which integrations are enabled.
//go:build tools
//go:generate go run github.com/DataDog/orchestrion pin -generate
package tools
// Imports in this file determine which tracer intergations are enabled in
// orchestrion. New integrations can be automatically discovered by running
// `orchestrion pin` again. You can also manually add new imports here to
// enable additional integrations. When doing so, you can run `orchestrion pin`
// to make sure manually added integrations are valid (i.e, the imported package
// includes a valid `orchestrion.yml` file).
import (
// Ensures `orchestrion` is present in `go.mod` so that builds are repeatable.
// Do not remove.
_ "github.com/DataDog/orchestrion" // integration
_ "github.com/DataDog/dd-trace-go/contrib/aws/aws-sdk-go-v2/v2/aws" // integration
_ "github.com/DataDog/dd-trace-go/contrib/database/sql/v2" // integration
_ "github.com/DataDog/dd-trace-go/contrib/net/http/v2" // integration
_ "github.com/DataDog/dd-trace-go/contrib/redis/go-redis.v9/v2" // integration
_ "github.com/DataDog/dd-trace-go/v2/ddtrace/tracer" // integration
_ "github.com/DataDog/dd-trace-go/v2/profiler" // integration
)
Resources / Evidence
Hi @arochadaniel! Thanks for reaching out. Allow me to clarify:
Is the attached image showing the upload of 4+ files? As you are pointing out, it looks like these requests are being bundled in a single trace, but they don't have a root span. Do you see the trace for PATCH /submissions/:id too?
That This trace is linked to other traces is related to span links. dd-trace-go integration for net/http might add them if a parent context is propagated through HTTP headers - which you already ruled out - but the one for aws/aws-sdk-go-v2 add them too when operating against S3 or DynamoDB.
By any chance, are you uploading these files to Amazon S3 in these endpoints? This would explain the behaviour. In that case, we are not facing a bug in Orchestrion.
Hey @darccio I really appreciate the fast response. I just tested removing the AWS integration and it did work! All traces are now coming 1 by 1, as expected. Thank you so much!
This was a tough one to figure out, maybe we should add this behavior to the docs for this specific integration? I tried looking for it but didn't find any mention.
What I did find was the Span Links section in the big traces I had before, and as you mention, the AWS integration was creating them.
Do you have any documentation on how the AWS integration creates the span links? I'm curious to see the logic behind it, maybe there's some actual use case for it.
@arochadaniel Let me transfer this to dd-trace-go repository, and ask the team maintaining our integrations to see how to proceed.
Edit: about how the AWS integration creates the span links, the source code will help you see how it works.
@darccio thank you very much. This was really helpful. Will keep an eye on this issue! Let me know if I can make any contribution 👍