node-restify icon indicating copy to clipboard operation
node-restify copied to clipboard

When using handleUncaughtExceptions, if an error is thrown within a stream callback in the `uncaughtException` handler, restify enters an infinite loop.

Open cprussin opened this issue 5 years ago • 10 comments

  • [x] Used appropriate template for the issue type
  • [x] Searched both open and closed issues for duplicates of this issue
  • [x] Title adequately and concisely reflects the feature or the bug

Bug Report

Restify Version

Tested on 7.7.0, almost definitely present in all versions due to underlying Node issue.

Node.js Version

v10.15.1, likely present in all versions of Node.

Expected behaviour

No infinite loop

Actual behaviour

Infinite loop

Repro case

const restify = require('restify');
const http = require('http');

const server = restify.createServer({ handleUncaughtExceptions: true });
server.on('uncaughtException', () => {
    console.log('catching exception!');
    http.get('http://google.com', () => { throw new Error('foobar'); });
});
server.get('/', () => { throw new Error('fail'); });
server.listen(2000, () => console.log('Server now up on port 2000'));

Cause

This is caused by this issue in node.

Are you willing and able to fix this?

Yes.

There is a workaround available for this issue, which would be to change on to once here. If the restify team is on board with this change, I'm happy to submit a PR.

cprussin avatar Feb 13 '19 23:02 cprussin

One of my concerns with this is that it makes this problematic behavior silent. Would it be useful to know whether the error handler actually throws or emit an error than is not handled?

If so, would setting up a custom error event handler that:

  1. wraps the error event handler and somehow tries to report an error (publish a metric somewhere) first, assuming that error reporting code would be simpler/more robust than the original error handling code
  2. if called again, turns to a no-op

be a better trade-off between silencing all errors that occur in the error handler and consuming resources without bounds?

Another question that I have is: is it possible for an error to be emitted or thrown in the domain error handler before resources associated with the request/response are released? If so, would those resources ever be released if the error handler is set using once?

misterdjules avatar Feb 14 '19 02:02 misterdjules

Would it be useful to know whether the error handler actually throws or emit an error than is not handled?

Yes, I definitely think that's ideal.

If so, would setting up a custom error event handler that:

  1. wraps the error event handler and somehow tries to report an error (publish a metric somewhere) first, assuming that error reporting code would be simpler/more robust than the original error handling code
  2. if called again, turns to a no-op

be a better trade-off between silencing all errors that occur in the error handler and consuming resources without bounds?

That does sound like a reasonable proposal to me. Any idea what you'd think such an API would look like?

Another question that I have is: is it possible for an error to be emitted or thrown in the domain error handler before resources associated with the request/response are released? If so, would those resources ever be released if the error handler is set using once?

That I'm not sure of. I could do more testing on Node to get a lower-level understanding. Regardless, I think that it would be ideal to solve this such that we can ensure that Restify always releases resources, if possible.

cprussin avatar Feb 14 '19 05:02 cprussin

@cprussin I'm thinking that prescribing/enforcing any behavior in restify is going to be tricky, because it seems difficult to determine what the best way is for restify consumers to handle those edge cases.

Is it possible for you to implement those ideas in a new domain within that restify domain error handler?

misterdjules avatar Feb 15 '19 00:02 misterdjules

We can definitely do that @misterdjules. Being that that's the case, Restify using once instead of on would cause restify to not enter an infinite loop, but would still allow consumers to write code that catch this case. So, should we change restify or leave it as is?

cprussin avatar Feb 15 '19 00:02 cprussin

@cprussin I'm not sure anymore that once is actually desirable in all cases.

For instance, it seems like a legitimate use case for restify's uncaughtException handler is to log errors that are emitted on the req and res objects, or any other emitter created in the context of req/res' domain. Using once would allow only one error event to be logged.

Thoughts on that?

misterdjules avatar Feb 15 '19 01:02 misterdjules

@misterdjules yeah, that's a good point.

Depending on what happens with the upstream node issue, we could also consider changing restify to create a new domain and call _onHandlerError within that domain. That is, change https://github.com/restify/node-restify/blob/4900d6bdd51fa4e1769678562de69929c38a0c4b/lib/server.js#L829-L831 to:

handlerDomain.on('error', function onError(err) {
    var errorDomain = domain.create();
    errorDomain.on('error', function onError(err) {
        // Do something with err, probably including telling app
        // authors how they can handle this more gracefully...
    });
    errorDomain.run(function run() {
        self._onHandlerError(err, req, res, true);
    });
});

That way, we avoid an infinite loop in restify, but apps can still do error detection within the uncaught exception handler (by creating yet another new domain...).

cprussin avatar Feb 15 '19 06:02 cprussin

@cprussin Very interesting!

Would the errorDomain's error handler include potentially making sure that the request/response's resources are closed if err is a thrown error (as opposed to an emitted error)?

I presume it could at the very least log an error too, even though I wonder if that would drown into a sea of logs and turn out to be not that useful.

misterdjules avatar Feb 15 '19 19:02 misterdjules

@misterdjules I would think it likely should do both of those, but OTOH is there an argument to be had that in this scenario, we should kill the process?

cprussin avatar Feb 15 '19 20:02 cprussin

@cprussin I don't know about killing the process, since the goal of using domains in the first place is to not exit on uncaught exceptions. I'm starting to think that all this should be left to the consumer of Restify because they have the most context on how they want their application to behave.

That is valid for https://github.com/nodejs/node/pull/26211 too, and so I'm starting to think that is not a desirable change too.

Regardless, we could definitely improve Restify's and domain's documentation by documenting those edge cases and available workarounds.

Thoughts?

misterdjules avatar Feb 20 '19 18:02 misterdjules

@cprussin I don't know about killing the process, since the goal of using domains in the first place is to not exit on uncaught exceptions. I'm starting to think that all this should be left to the consumer of Restify because they have the most context on how they want their application to behave.

That is valid for nodejs/node#26211 too, and so I'm starting to think that is not a desirable change too.

@misterdjules I agree that the ultimate insights into that an exception occurred should be up to the app, but I think that having a mode where the tool can enter an infinite loop unexpectedly (or at least, due to a condition that isn't obvious) is a problem--entering an unexpected infinite loop is a worse condition than crashing the process anyways. However, the more we chat about this, the more I think that the problem isn't Restify's to solve. Because of that, I'm actually really in favor of your change to Node, but I still think we should document those edge cases when using the restify flag so that consuming apps know how to account for these cases.

So my viewpoint on this is:

  1. Node shouldn't have a condition where it enters an infinite loop due to some nonobvious condition. Your PR will fix that, but will cause cases where handleUncaughtExceptions isn't sufficient to keep the server from crashing.
  2. I believe the purpose of the handleUncaughtExceptions flag is to indicate that uncaught exceptions on the request should be handled. I don't think it's up to restify to handle all possible exceptions. There will be edge cases to what conditions restify can handle exceptions under (for instance, what if an exception is thrown in Restify before we even enter the domain?). Point 1 will add such an edge case.
  3. To combat 2, we should add documentation in Restify for whatever edge cases exist in uncaught exception handling, and how to write a robust and stable error handler while using that flag.

cprussin avatar Feb 20 '19 19:02 cprussin