apm-agent-nodejs icon indicating copy to clipboard operation
apm-agent-nodejs copied to clipboard

an outgoing `http.request(...)` that fails with an 'error' event is not ended

Open trentm opened this issue 1 year ago • 1 comments

Using this example script:

// http-req-error.js
const apm = require('./').start({ // elastic-apm-node
  serviceName: 'http-req-error'
})
const http = require('http')

const trans = apm.startTransaction('trans')

const req = http.request('http://localhost:6666', function (res) {
  console.log('res: %s %s', res.statusCode, res.headers)
  res.on('data', chunk => {
    console.log('res "data": %s', chunk)
  })
  res.on('end', function () {
    console.log('res "end"')
  })
})
req.on('error', err => { console.log('req "error":', err) })
req.end()

setTimeout(() => {
  trans.end()
}, 1000)

and this patch:

diff --git a/lib/instrumentation/http-shared.js b/lib/instrumentation/http-shared.js
index 32cbb53a..ed4f075f 100644
--- a/lib/instrumentation/http-shared.js
+++ b/lib/instrumentation/http-shared.js
@@ -188,6 +188,7 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
       // is added instead of when `response` is emitted.
       const emit = req.emit
       req.emit = function wrappedEmit (type, res) {
+        console.log('XXX req.emit type: ', type)
         if (type === 'response') onResponse(res)
         if (type === 'abort') onAbort(type)
         return emit.apply(req, arguments)

We see this (with node v18; the stack is slightly different for different node versions):

% node http-req-error.js
XXX added trace-context headers
XXX have an http span still here
XXX req.emit type:  socket
XXX req.emit type:  prefinish
XXX req.emit type:  error
req "error": Error: connect ECONNREFUSED ::1:6666
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1247:16)
    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:130:17) {
  errno: -61,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '::1',
  port: 6666
}
XXX req.emit type:  close

and we get this trace:

    trace a7d4f2
    `- transaction 77aa08 "trans" (1006.22ms, outcome=unknown)

There is at least one bug here that there isn't a span. The span was started by our instrumentation, but it wasn't ended.

notes

  1. We could watch for error in function wrappedEmit to catch this. We could potentially also use the close event. However, I'd want to dig into whether that close is always emitted for all node versions we support.
  2. Should we captureError here on the 'error' event? I think so, but I'd like to discuss with others.
  3. The current watching for abort in wrappedEmit should potentially be changed to use close in newer versions of node, because 'abort' was deprecated in node v17.0.0, v16.12.0

trentm avatar Aug 03 '22 22:08 trentm

2. Should we captureError here on the 'error' event? I think so, but I'd like to discuss with others.

From separate discussion, so far 2 out of 2 (.NET and PHP) responding authors of other APM agents do capture an APM error in this case. So I think we should as well.

A possible consideration is performance impact. Capturing errors (because of stack trace collection, which can also involve sourcemap handling) can be slower. For a user with an app spewing lots of HTTP errors, there could be a more significant perf overhead. However, I'm not sure we need to gate on that. Really they should be wanting to see those errors in the APM app. Possibly a separate issue to profile this scenario.

trentm avatar Aug 04 '22 15:08 trentm