apm-agent-nodejs icon indicating copy to clipboard operation
apm-agent-nodejs copied to clipboard

`transaction.context.request.url.*` is wrong for an incoming HTTP request with a pathname starting with a double-slash

Open trentm opened this issue 2 years ago • 3 comments

transaction.context.request.url.* is wrong for an incoming HTTP request with a pathname starting with a double-slash. E.g.:

GET //foo/bar HTTP/1.1
Host: example.com

repro

Apply this patch:

diff --git a/examples/trace-http.js b/examples/trace-http.js
index f9fbb524..31e5840b 100755
--- a/examples/trace-http.js
+++ b/examples/trace-http.js
@@ -56,7 +56,12 @@ server.listen(3000, function () {
   //
   // Note that this there is no current "transaction" here, so this HTTP
   // request is not captured by APM. See "trace-http-request.js" for more.
-  const clientReq = http.request('http://localhost:3000/', function (clientRes) {
+  // const clientReq = http.request('http://localhost:3000/', function (clientRes) {
+  const clientReq = http.request({
+    host: 'localhost',
+    port: 3000,
+    path: '//user@foo'
+  }, function (clientRes) {
     console.log('client response: %s %s', clientRes.statusCode, clientRes.headers)
     const chunks = []
     clientRes.on('data', function (chunk) {
diff --git a/lib/parsers.js b/lib/parsers.js
index 377ba50d..82ed1800 100644
--- a/lib/parsers.js
+++ b/lib/parsers.js
@@ -27,6 +27,7 @@ function getContextFromRequest (req, conf, type) {
     url: getUrlFromRequest(req),
     headers: undefined
   }
+  console.log('XXX context.url', context.url)
   if (req.socket && req.socket.remoteAddress) {
     context.socket = {
       remote_address: req.socket.remoteAddress

Then run: node examples/trace-http.js.

The generated transaction includes:

    {
        "transaction": {
            "name": "GET //user@foo",
...
                "request": {
                    "http_version": "1.1",
                    "method": "GET",
                    "url": {
                        "raw": "//user@foo",
                        "protocol": "http:",
                        "hostname": "foo",
                        "port": "3000",
                        "full": "http://foo:3000"
                    },
                    "headers": {
                        "host": "localhost:3000",
                        "connection": "close"
                    },
                    "socket": {
                        "remote_address": "::ffff:127.0.0.1"
                    }
                },

details

The recent https://github.com/elastic/apm-agent-nodejs/pull/3133 fixed a similar issue in the handling of transaction.name for this case. The issue there was that the request path, e.g. //foo/bar, was used as a full URL in URL parsing (via new URL(...) or similar). However, this req.url is the "request-target" in the HTTP/1.1 request line and not a full URL.

The transaction.context.request.url.* value is calculated using the original-url module here: https://github.com/elastic/apm-agent-nodejs/blob/34cc2a77820ae5b52f5ffab36f33fbaece8ccaf6/lib/parsers.js#L27

trentm avatar Feb 01 '23 19:02 trentm

Root cause seems to be the same as it was in #3133

In this situation getUrlFromRequest is an exported method from a package named origina-url. We can see its import here.

The package original-url uses the legacy API url.parse which is not recommended in the documentation as it says

url.parse() uses a lenient, non-standard algorithm for parsing URL strings. It is prone to security issues such as host name spoofing and incorrect handling of usernames and passwords. Do not use with untrusted input. CVEs are not issued for url.parse() vulnerabilities. Use the WHATWG URL API instead.

The package is using it to create the base URL object for the result. You can see it at the top of the index.js file. Here's in an extract of it

const parseUrl = require('url').parse  // Here it's requiring the legacy function
const parseForwarded = require('forwarded-parse')
const net = require('net')

module.exports = function (req) {
  const raw = req.originalUrl || req.url
  const url = parseUrl(raw || '')   // <-- Here we parse the url string
  const secure = req.secure || (req.connection && req.connection.encrypted)
  const result = { raw: raw }
  let host

What would be ideal is to have the dependency updated. But seeing the commit history it seems unmaintained. Maybe @watson may help us on this or let us update the package.

david-luna avatar Feb 12 '23 00:02 david-luna

@david-luna: @\watson will very likely be receptive to a PR on original-url for this if you wanted to work on one. @\watson used to be the primary developer of this elastic-apm-node package and, I think, original-url was initial developed for this package.

trentm avatar Feb 13 '23 15:02 trentm

@david-luna yes, please if you don't mind to open a PR against original-url, then I'll take a look ❤️

watson avatar Feb 14 '23 10:02 watson