dd-trace-js
dd-trace-js copied to clipboard
fix(express): dont lose route if preceded by middleware with longer path
What does this PR do?
Fixes a bug where a middleware would be incorrectly taken as the resource when the middleware path is longer than the route handler path.
Motivation
This bugs seems to have been introduced in https://github.com/DataDog/dd-trace-js/pull/1767. It can happen when the middleware uses a regex.
For example with a setup like:
app.use(/^\/(foo|bar|baz).*/, (req, res, next) => next());
app.get('/foo', (req, res) => { res.send('ok') });
This condition https://github.com/DataDog/dd-trace-js/blob/fc7d8f649b81bcfda63f511938d200ffc83564dc/packages/datadog-plugin-router/src/index.js#L89-L92
Doesn't trigger because the middleware path is longer than the request path.
Plugin Checklist
- [x] Unit tests.
Thanks for the PR! The current fix would break support for any middleware running after the request has been handled (which is the case we were trying to solve with our original fix in the first place). Is there any condition that could be added instead to figure out which middleware was the handler?
I don't know if there is a reliable way to detect which middleware was the handler, but there are some workarounds we could add. Specifically I think this bug can only trigger if the preceding middleware is using a regex, since with strings path the middleware would have to be a prefix and therefore by definition shorter. We could try to detect that it's a regex and not count it in that case?
Updated the PR to look for a regex instead, what do you think now?
I don't know about ignoring regexes completely, but maybe strings could have higher priority. Same could be true when using something like get versus use.
I don't know about ignoring regexes completely
Is the current implementation ignoring regexes completely? Sorry I'm not entirely clear on what will happen in all the cases. Can you expand on which case you think it breaks? I can always add a unit test for that.
Same could be true when using something like get versus use
Do you know if there is a way to detect that?
So I updated it again - I think what we're trying to do is get the most specific route in the stack. Now the logic is this:
- If the context route is a regex, we check if it matches the current route:
- If it matches, we use the current route because it's more specific.
- If it doesn't, we keep the context route.
- If the context route is a string, we check if the current route is more specific than the context one using
.startsWith()
@rochdev have you had a chance to look at the latest updated version? (Btw we've been running this in production for 25 days without any issues)
@domkck I'm a bit worried about performance, but since it's isolated to regular expressions it might be good enough for most cases. I'm also not sure I understand the new logic as it doesn't seem to check length anymore.
Thanks @rochdev
I'm a bit worried about performance, but since it's isolated to regular expressions it might be good enough for most cases.
Is there a way we can test that?
As you said, I don't think it affects the majority of cases - it only relies on .starts/endsWith() unless the route is a regex.
I'm also not sure I understand the new logic as it doesn't seem to check length anymore.
So the current logic simply ends up taking the longest route as the handling route that the request should get tagged with. I think this works fine in all cases where regexes aren't involved, because the longest route in the stack is guaranteed to be the "most specific" one, for example:
app.use('/foo/bar', (req, res, next) => next()); // Some middleware
app.get('/foo/bar/baz', (req, res, next) => next()); // The actual route handler that we want to tag with
app.use('/foo', (req, res) => res.send()); // Middleware that runs after the route handler
// '/foo/bar/baz' gets correctly picked as the main route handler because it's the longest
However in cases where regexes are involved, the assumption that the longest route is the most specific breaks, for example:
app.use(/\/(foo|bar|baz)/, (req, res, next) => next()); // Middleware that matches any of /foo, /bar, /baz
app.get('/foo', (req, res) => res.send()); // This now doesn't get picked as the main route handler, because the above middleware has a longer string as a route definition
Generally I think it's more accurate to test whether the routes actually match rather than just relying on length. If routeA.startsWith(routeB) then routeA.length > routeB.length will always be true, but the inverse isn't necessarily true. Does that make sense?
@rochdev is there anything I can help with to get this merged?
Generally I think it's more accurate to test whether the routes actually match rather than just relying on length. If routeA.startsWith(routeB) then routeA.length > routeB.length will always be true, but the inverse isn't necessarily true. Does that make sense?
One problem with that approach is that Express supports patterns. This means that you could end up in a situation where routeA is /foo/* and routeB is /foo/bar in which case startsWith would return false since both of these would be considered strings and not regular expressions by the matcher.
Another problem is that calling match on a string that is actually a regular expression would also not match since the string itself would contain pattern characters.
This is basically why we originally went with the length as it keeps things simple and fast while handling most cases. It's not perfect of course, but I think for now it would be safer to add on top of the existing logic instead of changing it completely to avoid switching trade-offs back and forth.
With that said, for your use case specifically, only actual regular expressions seem to be important and not patterns, and they seem to happen first. This means that it should be possible to add to the existing logic with something like this:
- Check the legnth.
- If both routeA and routeB are strings, keep longest.
- If routeA is a string and routeB is a regular expression, keep routeA.
- If routeA is a regular expression and routeB is a string, keep longest. This could be improved in the future, but for now handling this explicitly is not needed for your use case.
Thanks for the suggestion @rochdev - updated the PR with your logic.
@rochdev have you had a chance to review this please?
Bumping :) this patch is still relevant and cleanly applies on top of v3.11.0
Codecov Report
Merging #2065 (505a6fa) into master (d638b4b) will decrease coverage by
0.68%. The diff coverage is80.00%.
@@ Coverage Diff @@
## master #2065 +/- ##
==========================================
- Coverage 86.57% 85.90% -0.68%
==========================================
Files 333 297 -36
Lines 11934 10880 -1054
Branches 33 33
==========================================
- Hits 10332 9346 -986
+ Misses 1602 1534 -68
| Impacted Files | Coverage Δ | |
|---|---|---|
| packages/datadog-plugin-router/src/index.js | 90.14% <80.00%> (-0.91%) |
:arrow_down: |
... and 36 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more