router icon indicating copy to clipboard operation
router copied to clipboard

Use diagnostics_channel in layers

Open Qard opened this issue 5 years ago • 14 comments

This is a proof-of-concept implementation of the diagnostics_channel feature I'm working on in Node.js core. The value here is for APM products to be able to capture changes to routing state through the lifecycle of a request. This should of course not be merged until the module exists in Node.js itself. I'm just creating this now to prove the value to userland and to make sure a more complex framework can benefit from it.

APM vendors want to use events from diagnostics_channel to know when a route has changed and what the full routing path to the route is so they can bucket trace data based on the full routing path. For example, requests to /hello/world and /hello/stephen might map to a /:name route on a nested router under /hello which should be able to produce /hello/:name as the full routing path. This requires some method of tracking the active layer so the point at which the diagnostics_channel events are published there will be valid routing information present for the subscriber to gather from the request.

I'll leave this as a draft PR until diagnostics_channel lands in Node.js core.

Qard avatar Sep 17 '20 19:09 Qard

I think this is actually two features added (not that I think that should pose a problem). It both tracks the layers hit on a req and also reports to the diag channel. The request for tracking the layers is long standing, and there are a few other implementations we would want to consider before we would land this. If you don't want to take that discussion on, but still want to see about landing the usage of this feature you might consider removing that.

FWIW, I think this approach is fine with that layer array, but it will still require most folks to iterate to determine the last "route" layer which is typically what folks want when they ask for this feature.

Existing PR for "layer tracking": https://github.com/pillarjs/router/pull/86

wesleytodd avatar Sep 18 '20 14:09 wesleytodd

Yes @wesleytodd , I thought the same thing regarding this PR. Looking at the tracking this adds, I think it is done differently from what I've seen the others doing so far, including that linked PR. Specifically, it doesn't store the matched layers/routes on the request, instead it is trying to keep just a stack of the current hierchy of where it is at the moment, popping them back off once it leaves (vs keeping them for tracking which ones matched).

dougwilson avatar Sep 18 '20 14:09 dougwilson

Yeah, and maybe that is not a problem, but also if we can solve the long standing ask and also deliver on the needs to reporting to the diag channel I think we would want to.

wesleytodd avatar Sep 18 '20 16:09 wesleytodd

Yeah, so the layer matching bits are basically because what an APM will want from this diagnostics_channel data is to be able to lookup what is the full path to the route I'm at right now.

Ideally we would be able to attribute any async activity to the route or middleware it originated from. I had initially tried to use the Span API to also include an end event to encapsulate each route or middleware, but ran into issues as it's easy to track when next is called but harder to track when a route "completes" by sending a response.

Qard avatar Sep 18 '20 16:09 Qard

For the "send a response" case you can just use on-finished, and then do your cleanup on either next or in that callback, whichever is hit first.

wesleytodd avatar Sep 18 '20 17:09 wesleytodd

Gotcha. I see you are just string concatening together regexps and arrays in your end result path string, so I'm not sure that is going to work well as a general feature.

dougwilson avatar Sep 18 '20 17:09 dougwilson

Doesn't look like on-finished is currently a dependency, so I'd have to add it. It would also somewhat complicate the logic of how diagnostics_channel is used here.

As for the path string, it's a concatenation of the routing path strings before being converted to regex. For example:

const router = new Router()
const hello = new Router()

router.use('/hello', hello)

hello.get('/:name', ...)

This router structure would result in a concatenated path of /hello/:name. Basically APM products want to know how to map individual routes to the full routing path it takes to get to it. We currently do that by monkey-patching express, which is fragile and has performance implications. We'd prefer to be able to receive that information through diagnostics_channel and not have to patch at all.

Qard avatar Sep 18 '20 19:09 Qard

As for the path string, it's a concatenation of the routing path strings before being converted to regex

It sounds like you are assuming the users are passing a string in the first place. Take the following router combo, though:

const router = new Router()
const hello = new Router()

router.use(['/hello', /^\/(?:good)?bye/, '/aloha'], hello)

hello.get('/:name', ...)

dougwilson avatar Sep 19 '20 02:09 dougwilson

That's fine. It will use the stringified form of the regex. The purpose is just to have a unique-to-that-route name to match transaction data to. If that name is /hello/:name or /^\\/(?:good)?bye//:name doesn't really matter, just that it has a name that can be relatively easily read by the user and traced back to the location in their app code. Some APMs may have support for keeping the routing fragments separate, but most would just join them.

Qard avatar Sep 21 '20 17:09 Qard

Ah, I see. But of course this router allows you to declare the same route name multiple times, as it does not enforce that route names are unique. This is because of things like next() flow control. For example:

const router = new Router()
const hello = new Router()

router.use('/hello', hello)
router.get('/hello/:name', (req, res) => res.end(`hello, ${name}`))

hello.param('name', (req, res, next, name) => next(/^[a-z]$/.test(name) ? null : 'route'))
hello.get('/:name', (req, res, next) => res.end(`hola, ${name}`))

The above has effectively two different routes named /hello/:name that your code would identify, but they are two different unique routes. /hello/dan will invoke one while /hello/Dan will invoke the other.

dougwilson avatar Sep 21 '20 17:09 dougwilson

I don't think it's a big issue that the paths aren't unique. It's just a bucketing mechanism. If a bucket might contain data from two routes that do different things but share the same path, it's not ideal, but there's not really any other great way to differentiate them. Most routes internally have some degree of branching anyway, so it's expected that every request in that bucket might not look exactly the same. It doesn't break anything if data from multiple routes winds up in the same bucket, it just makes the data maybe slightly less meaningful. Also, small side note: the HTTP method is also generally included in the string used as the bucket key.

I'm not concerned about two routes winding up with the same name, so long as that name can be instructive to the user on where in their app to look for the code. If they happen to have multiple routes with the same name, it's on them to look at those routes and figure out on their own which one a given trace came from. Route-naming is very much best-effort in APM. There's lots of cases where we get stuff like /static/* or something like that which will result in wildly different performance profiles and possibly very different execution paths. We just do our best to define reasonable buckets that are somewhat intuitive to the user and will help them to narrow down trace data to somewhat specific parts of their code.

Qard avatar Sep 21 '20 17:09 Qard

@Qard I am working on getting this repo back on track for the renewed plans for v5. Are you interested in landing this still? Let me know so I can get it on our plans if so.

wesleytodd avatar Mar 16 '24 18:03 wesleytodd

It would still be of value to have diagnostics_channel in there, yes. Though it probably needs reworking at this point to get it up-to-date. At the time we opted for targeting fastify as a routing framework to test diagnostics_channel with instead as express and related projects had somewhat stagnated at that time so we didn't see the changes as likely to actually land any time soon at that point.

I'll share with the team at Datadog to see if we want to put some time into updating this in the near future. No big deal if you're pushing for release soon though--we can add support later if necessary. Thanks for the reminder though! I had entirely forgotten I made this. 😅

Qard avatar Mar 17 '24 03:03 Qard

Yeah this would likely be a minor release so not a big deal to wait, but I was just trying to wrangle all the open things to make sure we had a clear plan in place and this still does seem valuable to me. Let me know if you folks have time to work on this!

wesleytodd avatar Mar 18 '24 14:03 wesleytodd