dd-trace-js icon indicating copy to clipboard operation
dd-trace-js copied to clipboard

All incoming HTTP requests are assigned the same trace ID

Open krosen040 opened this issue 3 years ago • 8 comments

Describe the bug After upgrading dd-trace from 0.32.2 to 1.5.1 we have a problem with an application where all incoming HTTP requests are assigned the same trace ID. It appears to be similar to the problem described in #1239, except we're using [email protected].

Screen Shot 2021-12-10 at 3 43 27 PM

By enabling debug logging, we could confirm that on this line there's always an active scope, i.e. tracer.extract is never entered (so this is never logged). This explains the behavior, but is very surprising. It seems like there's a single span that becomes the parent span of everything else.

If we use sync scopes (by setting DD_TRACE_SCOPE=sync) it fixes the problem (we can see that tracer.extract is entered), but I don't think this is a good long-term solution.

Environment

  • Operation system: centos:7
  • Node version: 12.22.7
  • Tracer version: 1.5.1
  • Agent version: 7.31.1

krosen040 avatar Dec 10 '21 06:12 krosen040

This is indeed very surprising, since we rely on a async_hooks to do the tracking which is built into Node, so I would expect this to track things properly. Before trying sync were you already configuring the scope option or were you using the default? Also, can you try upgrading Node to the latest version? If this is not possible in production, at least just trying in any environment just to see if it fixes the issue and then we can try to understand why and see our options.

If upgrading Node doesn't work and you're not setting the scope option and relying on the default, then there is definitely a bug in either the tracer, Node or your app, so the next step will be trying to figure out which one it is. This will be somewhat involved and I can provide guidance, but first I want to make sure that the problem still exists with latest Node and no scope configured.

rochdev avatar Dec 10 '21 16:12 rochdev

Before trying sync were you already configuring the scope option or were you using the default?

We were just using the default (which I believe is async_resource with the version of Node that we're using).

We'll try to update the version of Node and see if that helps.

krosen040 avatar Dec 11 '21 08:12 krosen040

@krosen040 Were you able to try updating Node, and if yes, did it resolve the issue?

rochdev avatar Jan 10 '22 21:01 rochdev

@rochdev We haven't had the time yet, but I hope we'll be able to prioritize it in the near future. I'll make sure to give an update once we've done so.

krosen040 avatar Jan 12 '22 00:01 krosen040

@rochdev We're still seeing the same behavior after updating to Node v14.18.2.

krosen040 avatar Jan 13 '22 07:01 krosen040

Is this something you are able to reproduce with a snippet you could share? For example, if you can extract one of the problematic endpoints and remove your business logic so we can reproduce the issue with a small non-sensitive project it would be a lot easier for us to find the issue.

rochdev avatar Jan 14 '22 21:01 rochdev

@krosen040 Any chance of getting a small reproduction example as asked by @rochdev ? Also, are you using any userland promise libraries or in-memory queuing libraries? And if so, which versions.

bengl avatar Apr 14 '22 18:04 bengl

@bengl Apologies for the late reply. Unfortunately I don't have a reproduction example (I'm not really a Node.js developer and don't own the application in question, I'm sort of acting as the middle-man here), but I think we might have identified the problem.

The application uses https://github.com/yahoo/dnscache which wraps the dns module (that project is archived but our company has an internal fork). We noticed that if we don't use this library everything works as expected.

The dnscache library uses https://github.com/kriskowal/asap internally. Removing the usage of asap from dnscache (essentially removing the calls to next in this file) makes it work, so it seems like the asap library is the problem here. We're using version 2.0.6.

krosen040 avatar Apr 21 '22 06:04 krosen040

I'm going to close this issue as it's pretty old and we've had a few major versions come and go since it was reported. Please reopen if a recent version of the tracer still exhibits the same problem!

tlhunter avatar Dec 18 '23 19:12 tlhunter