dd-trace-js icon indicating copy to clipboard operation
dd-trace-js copied to clipboard

Spans from unrelated requests fused together

Open tlhunter opened this issue 4 years ago • 16 comments

Describe the bug

Our APM traces seem to be getting combined into the wrong requests. This issue started after I upgraded to version [email protected], it doesn't seem like it was happening with [email protected]. This is with the [email protected] framework.

Here's the queries displayed in that request. The three queries that run 4 times are each executed once at the start of the request. image

Here's the timeline view which suggests something is amiss: image

Environment

  • Operation system: Linux
  • Node version: v12.14.1
  • Tracer version: 0.30.6
  • Agent version:

tlhunter avatar Feb 09 '21 19:02 tlhunter

We changed the default scope manager in 0.30.6, so it's likely that this is the issue. Can you try with 0.30.5, or otherwise if it's easier can you try with the DD_TRACE_SCOPE=async_local_storage environment variable?

Also, does this happen on any request or only on specific requests?

rochdev avatar Feb 09 '21 19:02 rochdev

I'm going to deploy a version with 0.30.5 and will let you know. This does affect other routes as well.

tlhunter avatar Feb 09 '21 20:02 tlhunter

@rochdev it is still happening quite a bit with this version. Here's a screenshot of one that seems to contain hundreds of requests if I keep scrolling down. Each one of these pink dots represents a new request:

Screenshot_20210212_164459

tlhunter avatar Feb 13 '21 00:02 tlhunter

Note that this service was just upgraded to Node.js v14.15.4 and is still exhibiting the same behavior.

tlhunter avatar Feb 13 '21 00:02 tlhunter

Can you provide a reproduction case? Otherwise, do you know the exact version when it started happening? There were quite a bit of changes between 0.26 and 0.36 so it's a bit difficult to narrow down what change may have caused this without a reproduction.

rochdev avatar Feb 15 '21 15:02 rochdev

@tlhunter Do you still have the issue with the latest version?

rochdev avatar Apr 23 '21 15:04 rochdev

@rochdev just deployed our app with [email protected] and we're still getting the same issue:

screenshot-20210428140111

Could it be that some metadata that we're attaching is causing this issue? Values like these are present:

account_id | high cardinality hex string
admin | boolean
env | low cardinality string
mode | low cardinality string
request_id | 1619643462113:e82eab6e26ce:1:ko1v8wce:62338
version | low cardinality string

tlhunter avatar Apr 28 '21 21:04 tlhunter

Could it be that some metadata that we're attaching is causing this issue?

It's unlikely if it was working before. Are you able to pinpoint exactly in which version the issue started? Otherwise are you able to provide a reproduction?

rochdev avatar Apr 29 '21 18:04 rochdev

Sadly I don't have the bandwidth to do either. Also, I'll no longer have access to the project in about a week. I could send a private message with the output of npm ls in case a package is conflicting, like perhaps something introduces a known issue with async_hooks.

tlhunter avatar Apr 29 '21 23:04 tlhunter

I could send a private message with the output of npm ls in case a package is conflicting

Sure, that might help although we're currently handling the ones we know about, but maybe there are some that we don't.

Also, I'll no longer have access to the project in about a week.

Anyone else that could I could follow up with? Given this issue it doesn't look like Datadog is in a usable state for your team right now 😅

rochdev avatar Apr 30 '21 19:04 rochdev

@rochdev sent you a DM via Slack Datadog Public

tlhunter avatar Apr 30 '21 23:04 tlhunter

@rochdev @tlhunter has this been resolved? If so, we should close this.

Qard avatar Aug 17 '22 11:08 Qard

@Qard Sounds like this could be related to the issue we discussed offline this week about an unexpected span ending up as a parent of all request spans.

rochdev avatar Aug 17 '22 18:08 rochdev

It reads more to me like one request leaking into another, not all requests rooted from a higher level span, unless I'm misreading something.

Qard avatar Aug 18 '22 11:08 Qard

@Qard I am no longer with the company where I had encountered this issue.

Ironically, I suspect I'll soon go from reporting this issue to working on it, haha.

tlhunter avatar Aug 25 '22 18:08 tlhunter

So I've heard. 🎉

Qard avatar Aug 26 '22 05:08 Qard

No idea if this was resolved but it's certainly no longer affecting me.

tlhunter avatar Dec 18 '23 19:12 tlhunter