dd-trace-js
dd-trace-js copied to clipboard
Adds support for undici tracing
What does this PR do?
This PR adds support for tracing undici, for the upcoming release.
Closes #1615
Motivation
Undici does not use the standard HTTP / HTTPS modules from node.js. This is required so we can properly trace undici requests.
Plugin Checklist
- [x] Unit tests.
- [x] TypeScript definitions.
- [x] TypeScript tests.
- [x] API documentation.
- [x] CircleCI jobs/workflows.
- [x] Plugin is exported.
Additional Notes
@Qard , any suggestion regarding the node-ext and upstream koa failures? Is my setup wrong?
I'm pretty sure it's due to next.js tests trying to bundle it when it shouldn't. I believe there was a place to exclude things, but I forget where exactly. @rochdev might know how to deal with that.
Got bitten by this; https://github.com/nodejs/node/blob/v16.11.1/lib/diagnostics_channel.js#L101-L103;
the channels are weak ref'd and I've spent about 2 days trying to understand why my tracing would not work while deployed and was working from the REPL / tests 😮💨 Will adjust the PR.
any suggestion regarding the node-ext and upstream koa failures? Is my setup wrong?
For upstream koa they are also failing on master so you can ignore them. For next I think rebasing might fix it since there was a recent change to how it's tested.
the channels are weak ref'd and I've spent about 2 days trying to understand why my tracing would not work
We were also bitten by this while implementing diagnostics_channel. This is supposed to be an implementation details but is impacting common use cases enough that it should be changed. I'll open a PR to Node when I have some time.
is impacting common use cases enough that it should be changed
glad to know I'm not alone! Will rebase and see how it goes!
@rochdev should I gate here for multiple registrations? or is the framework guaranteeing a single installation of the plugin?
Hi there! Any update on this one?
callback. So we need a minimum of 4 events (start, end, async-end, error) and undici only implements 3.
I do not have the time to look into it right now, but adding the missing one is just a PR away. It would probably take less time and be more maintenance friendly to just do that.
@rochdev , if I recap, in order to use diagnostics_channels, we need to implement the event handling through async storage.
We cannot currently properly implement async_storage as we're missing an event from undici, (end or async-end I assume).
@mcollina or @rochdev do you happen to know where this 4th event need to be emitted from? I really wish to move forwards with diags channels, as the monkey patch implementation is not trivial coming from there.
@flovilmart The events that are needed are basically:
- start: When the operation starts, which would usually be when the function from the public API is called. We need this event to create a span, create a new
AsyncLocalStoragestore, and add the span to the store. - end: When the operation ends but on the current stack, so not the entire asynchronous operation but only the synchronous part of it, which would usually be when the function from the public API returns. We need this event to restore the previous store from before the function was called.
- error: When any error occurs, which is needed to attach the error to the current span.
- async-end: When the operation finished completely, including anything asynchronous that it was waiting on. We need this to finish the span.
Additionally, while promises are handled automatically by async_hooks, callbacks are not, so any callback needs to be properly bound to the correct AsyncResource.
So an example fake request function could look something like this:
const { AsyncResource } = require('async_hooks')
function request (callback) {
const req = {}
callback = AsyncResource.bind(callback)
startChannel.publish(req)
doActualAsyncRequest(req, (err, res) => {
if (err) {
errorChannel.publish(err)
}
asyncEndChannel.publish(req)
callback(req, res)
})
endChannel.publish(req)
}
Oh great! Thanks @rochdev ; let me see how I can get this implemented.
@rochdev , I'm curious; and perhaps the current implementation is completely wrong, but given that this implementation keeps a WeakMap of Request -> Spans, isn't it circumventing this problem by manually creating / terminating spans?
I'm currently running this implementation in a production service, and the traces are sane 🤔
@flovilmart The context is never activated in the current implementation, unlike http, so any child of undici would end up in a different context. Since this is a client, it's less likely to have children, and when it does it could be less often, for example tcp.connect, which is why it appears to be working properly for your use case. However, for tracing we shouldn't assume what can or cannot have children so this should be handled. The problem is that without the missing event, handling it would be impossible since you would have no way to know when the function returns, meaning that the new context holding the span would leak after the function call. The same issue would potentially affect the callback if it's not properly bound with async_hooks.
Unfortunately these are issues that are much easier to solve with monkey-patching, but since undici is in active development it should also be relatively straightforward to add the missing events/bindings to the library itself and then use that in this PR which would only require small changes.
We're also working on a doc to explain how to do tracing using DC, especially since it's much more involved to get it right compared to monkey-patching, which makes it more difficult to use for external contributors.
Thanks a lot @rochdev for taking the time for this explanation! I concur that monkey patching is more straightforward then! Let me reevaluate both approaches.
@flovilmart Sorry for the runaround with this PR. We hadn't quite realized that the current events were incomplete when we recommended trying to use DC for this. Let me know if you need any help to bring this to the finish line or have any questions about either approach.
@rochdev no problem! All's good; I did not realize we were missing so many events.
@rochdev
async-end: When the operation finished completely, including anything asynchronous that it was waiting on. We need this to finish the span.
Does it mean the "async-end" should be emitted when body is consumed (read end from socket) ? If so, @flovilmart correct me if im wrong the current implementation with WeakMap is wrong, since the user is receiving "Readable" instance which may be consumed at any time later, means the span does not include the socket read time.
@artur-ma According to the current implementation for http yes, it would be after the response has been received. However, these are just the bare minimum required for tracing to work at all, but it definitely wouldn't hurt to have more events that are more precise, for example adding some sort of response or headers-received or something similar, which I believe already exist in Undici although I haven't looked for a few weeks.
@artur-ma sure, this doesn't include body reading probably it would need it's own span outside of the request span. What do you think? I'm saying outside, as some users may not even read the body, but simply rely on the headers.
AFAIK, anyway even if the user will not use the body, it should be consumer or destroyed by design https://github.com/nodejs/undici#garbage-collection
I think, its possible to add such event on destroy / consume since the "body" is actually a Readable object, which "knows" when it happens The only downside I see here, is that the user may choose not to consume it immediately but after some user action, which will be included in overall request time
@artur-ma that seems reasonable. I'm still on the fence about the monkey patching vs intrumentation with channels. I haven't had the chance to go back to it.
The only downside I see here, is that the user may choose not to consume it immediately but after some user action, which will be included in overall request time
I think that's correct and maps what the user is doing, i.e. it's something I would want to see in the traces.
With regards to the recent announcement that undici's fetch would be integrated in node as experimental. I believe it makes sense I wrap first the fetch implementation through patching, without diagnostic channels. Then I'll try to get something done in a more generic manner; what do you think?
I still recommend getting this right. It will take a few months before this is unflagged.
Gotcha!
Would this PR allow distributed tracing to connect services that are making requests via undici? We recently changed to undici for many of our server-to-server requests and the services are no longer connected on their traces. Does anyone know a workaround to connect them in the meantime? Are there headers I can set on the undici requests?
@lucasmarcelli You can inject the current tracing context in the headers like this:
const headers = {}
const span = tracer.scope().active()
tracer.inject(span, 'http_headers', headers)
client.dispatch({
path: '/',
method: 'GET',
headers
}
@rochdev thank you so much, i will give that a try :)
@rochdev I believe I made some strides in the right direction:
- calls are now wrapped
- we use diagnostics channels in order to
- inject span with all the request info
- inject propagation headers in the request
- capture connect / abort errors
- capture status code
- we use the wrapping to
- initially create the spans
- propagate the async context
- properly wrap the
pipelinecommand - finish the spans when the promise resolves
I also extracted commons from the http client wrapper as suggested.
The test suite covers both using client.[method] and undici.[method] and ensure both emit traces properly at a high level.
❓ If we want to be extensive, we should run the whole suite with all methods we'll probably find some wholes in there.
The instrumentation is a bit all over the place with the 2 wrapping strategies, I could attempt to simplify a bit 😅.
let me know what you think.
@mcollina how would you approach instrumenting reading the response body? Is there a single member that I can wrap on the response and watch for an end event? Ideally, I'd love to add a child span that just represents reading the request body. Is it possible to get an event when the stream starts being consumed? Or should I simply finish the span only when the body end event is emitted?