dd-trace-js icon indicating copy to clipboard operation
dd-trace-js copied to clipboard

Adds support for undici tracing

Open flovilmart opened this issue 4 years ago • 32 comments

What does this PR do?

This PR adds support for tracing undici, for the upcoming release.

Closes #1615

Motivation

Undici does not use the standard HTTP / HTTPS modules from node.js. This is required so we can properly trace undici requests.

Plugin Checklist

Additional Notes

flovilmart avatar Oct 05 '21 20:10 flovilmart

@Qard , any suggestion regarding the node-ext and upstream koa failures? Is my setup wrong?

flovilmart avatar Oct 13 '21 18:10 flovilmart

I'm pretty sure it's due to next.js tests trying to bundle it when it shouldn't. I believe there was a place to exclude things, but I forget where exactly. @rochdev might know how to deal with that.

Qard avatar Oct 15 '21 18:10 Qard

Got bitten by this; https://github.com/nodejs/node/blob/v16.11.1/lib/diagnostics_channel.js#L101-L103;

the channels are weak ref'd and I've spent about 2 days trying to understand why my tracing would not work while deployed and was working from the REPL / tests 😮‍💨 Will adjust the PR.

flovilmart avatar Oct 19 '21 14:10 flovilmart

any suggestion regarding the node-ext and upstream koa failures? Is my setup wrong?

For upstream koa they are also failing on master so you can ignore them. For next I think rebasing might fix it since there was a recent change to how it's tested.

the channels are weak ref'd and I've spent about 2 days trying to understand why my tracing would not work

We were also bitten by this while implementing diagnostics_channel. This is supposed to be an implementation details but is impacting common use cases enough that it should be changed. I'll open a PR to Node when I have some time.

rochdev avatar Oct 19 '21 15:10 rochdev

is impacting common use cases enough that it should be changed

glad to know I'm not alone! Will rebase and see how it goes!

flovilmart avatar Oct 19 '21 15:10 flovilmart

@rochdev should I gate here for multiple registrations? or is the framework guaranteeing a single installation of the plugin?

flovilmart avatar Oct 19 '21 16:10 flovilmart

Hi there! Any update on this one?

flovilmart avatar Nov 29 '21 18:11 flovilmart

callback. So we need a minimum of 4 events (start, end, async-end, error) and undici only implements 3.

I do not have the time to look into it right now, but adding the missing one is just a PR away. It would probably take less time and be more maintenance friendly to just do that.

mcollina avatar Dec 04 '21 08:12 mcollina

@rochdev , if I recap, in order to use diagnostics_channels, we need to implement the event handling through async storage. We cannot currently properly implement async_storage as we're missing an event from undici, (end or async-end I assume). @mcollina or @rochdev do you happen to know where this 4th event need to be emitted from? I really wish to move forwards with diags channels, as the monkey patch implementation is not trivial coming from there.

flovilmart avatar Dec 06 '21 12:12 flovilmart

@flovilmart The events that are needed are basically:

  • start: When the operation starts, which would usually be when the function from the public API is called. We need this event to create a span, create a new AsyncLocalStorage store, and add the span to the store.
  • end: When the operation ends but on the current stack, so not the entire asynchronous operation but only the synchronous part of it, which would usually be when the function from the public API returns. We need this event to restore the previous store from before the function was called.
  • error: When any error occurs, which is needed to attach the error to the current span.
  • async-end: When the operation finished completely, including anything asynchronous that it was waiting on. We need this to finish the span.

Additionally, while promises are handled automatically by async_hooks, callbacks are not, so any callback needs to be properly bound to the correct AsyncResource.

So an example fake request function could look something like this:

const { AsyncResource } = require('async_hooks')

function request (callback) {
  const req = {}

  callback = AsyncResource.bind(callback)

  startChannel.publish(req)

  doActualAsyncRequest(req, (err, res) => {
    if (err) {
      errorChannel.publish(err)
    }

    asyncEndChannel.publish(req)

    callback(req, res)
  })

  endChannel.publish(req)
}

rochdev avatar Dec 06 '21 15:12 rochdev

Oh great! Thanks @rochdev ; let me see how I can get this implemented.

flovilmart avatar Dec 07 '21 14:12 flovilmart

@rochdev , I'm curious; and perhaps the current implementation is completely wrong, but given that this implementation keeps a WeakMap of Request -> Spans, isn't it circumventing this problem by manually creating / terminating spans? I'm currently running this implementation in a production service, and the traces are sane 🤔

flovilmart avatar Dec 07 '21 14:12 flovilmart

@flovilmart The context is never activated in the current implementation, unlike http, so any child of undici would end up in a different context. Since this is a client, it's less likely to have children, and when it does it could be less often, for example tcp.connect, which is why it appears to be working properly for your use case. However, for tracing we shouldn't assume what can or cannot have children so this should be handled. The problem is that without the missing event, handling it would be impossible since you would have no way to know when the function returns, meaning that the new context holding the span would leak after the function call. The same issue would potentially affect the callback if it's not properly bound with async_hooks.

Unfortunately these are issues that are much easier to solve with monkey-patching, but since undici is in active development it should also be relatively straightforward to add the missing events/bindings to the library itself and then use that in this PR which would only require small changes.

We're also working on a doc to explain how to do tracing using DC, especially since it's much more involved to get it right compared to monkey-patching, which makes it more difficult to use for external contributors.

rochdev avatar Dec 07 '21 15:12 rochdev

Thanks a lot @rochdev for taking the time for this explanation! I concur that monkey patching is more straightforward then! Let me reevaluate both approaches.

flovilmart avatar Dec 07 '21 18:12 flovilmart

@flovilmart Sorry for the runaround with this PR. We hadn't quite realized that the current events were incomplete when we recommended trying to use DC for this. Let me know if you need any help to bring this to the finish line or have any questions about either approach.

rochdev avatar Dec 07 '21 21:12 rochdev

@rochdev no problem! All's good; I did not realize we were missing so many events.

flovilmart avatar Dec 07 '21 21:12 flovilmart

@rochdev

async-end: When the operation finished completely, including anything asynchronous that it was waiting on. We need this to finish the span.

Does it mean the "async-end" should be emitted when body is consumed (read end from socket) ? If so, @flovilmart correct me if im wrong the current implementation with WeakMap is wrong, since the user is receiving "Readable" instance which may be consumed at any time later, means the span does not include the socket read time.

artur-ma avatar Dec 26 '21 23:12 artur-ma

@artur-ma According to the current implementation for http yes, it would be after the response has been received. However, these are just the bare minimum required for tracing to work at all, but it definitely wouldn't hurt to have more events that are more precise, for example adding some sort of response or headers-received or something similar, which I believe already exist in Undici although I haven't looked for a few weeks.

rochdev avatar Jan 10 '22 20:01 rochdev

@artur-ma sure, this doesn't include body reading probably it would need it's own span outside of the request span. What do you think? I'm saying outside, as some users may not even read the body, but simply rely on the headers.

flovilmart avatar Jan 11 '22 14:01 flovilmart

AFAIK, anyway even if the user will not use the body, it should be consumer or destroyed by design https://github.com/nodejs/undici#garbage-collection

I think, its possible to add such event on destroy / consume since the "body" is actually a Readable object, which "knows" when it happens The only downside I see here, is that the user may choose not to consume it immediately but after some user action, which will be included in overall request time

artur-ma avatar Jan 18 '22 22:01 artur-ma

@artur-ma that seems reasonable. I'm still on the fence about the monkey patching vs intrumentation with channels. I haven't had the chance to go back to it.

flovilmart avatar Jan 19 '22 21:01 flovilmart

The only downside I see here, is that the user may choose not to consume it immediately but after some user action, which will be included in overall request time

I think that's correct and maps what the user is doing, i.e. it's something I would want to see in the traces.

mcollina avatar Jan 20 '22 18:01 mcollina

With regards to the recent announcement that undici's fetch would be integrated in node as experimental. I believe it makes sense I wrap first the fetch implementation through patching, without diagnostic channels. Then I'll try to get something done in a more generic manner; what do you think?

flovilmart avatar Jan 30 '22 20:01 flovilmart

I still recommend getting this right. It will take a few months before this is unflagged.

mcollina avatar Jan 30 '22 21:01 mcollina

Gotcha!

flovilmart avatar Jan 30 '22 22:01 flovilmart

Would this PR allow distributed tracing to connect services that are making requests via undici? We recently changed to undici for many of our server-to-server requests and the services are no longer connected on their traces. Does anyone know a workaround to connect them in the meantime? Are there headers I can set on the undici requests?

lucasmarcelli avatar Jan 31 '22 16:01 lucasmarcelli

@lucasmarcelli You can inject the current tracing context in the headers like this:

const headers = {}
const span = tracer.scope().active()

tracer.inject(span, 'http_headers', headers)

client.dispatch({
  path: '/',
  method: 'GET',
  headers
}

rochdev avatar Jan 31 '22 17:01 rochdev

@rochdev thank you so much, i will give that a try :)

lucasmarcelli avatar Jan 31 '22 18:01 lucasmarcelli

@rochdev I believe I made some strides in the right direction:

  • calls are now wrapped
  • we use diagnostics channels in order to
    • inject span with all the request info
    • inject propagation headers in the request
    • capture connect / abort errors
    • capture status code
  • we use the wrapping to
    • initially create the spans
    • propagate the async context
    • properly wrap the pipeline command
    • finish the spans when the promise resolves

I also extracted commons from the http client wrapper as suggested. The test suite covers both using client.[method] and undici.[method] and ensure both emit traces properly at a high level.

❓ If we want to be extensive, we should run the whole suite with all methods we'll probably find some wholes in there.

The instrumentation is a bit all over the place with the 2 wrapping strategies, I could attempt to simplify a bit 😅.

let me know what you think.

flovilmart avatar Feb 01 '22 14:02 flovilmart

@mcollina how would you approach instrumenting reading the response body? Is there a single member that I can wrap on the response and watch for an end event? Ideally, I'd love to add a child span that just represents reading the request body. Is it possible to get an event when the stream starts being consumed? Or should I simply finish the span only when the body end event is emitted?

flovilmart avatar Feb 08 '22 15:02 flovilmart