node-postgres icon indicating copy to clipboard operation
node-postgres copied to clipboard

Hard to handle idle-in-transaction errors

Open mgabeler-lee-6rs opened this issue 1 year ago • 8 comments

I've setup my application's pool with an on('error') hook to log errors that happen outside queries.

However, this hook is not always active between queries. If I acquire a connection from the pool to run a transaction, and that transaction falls afoul of the idle_in_transaction_session_timeout, the error thrown from there doesn't happen during a query, and so if I don't take extra steps to add an on('error') listener for each such connection, my app dies.

But I also have to make sure to manually remove those, since releasing the client back to the pool doesn't do so!

Is there a recommended way to do this "nicely"?

I see some possible ways to make this more ergonomic, which might be applied in combination:

  1. Apply that idle error listener from the pool even when the client is acquired
    • This probably would need to be combined with some logic to avoid calling it if a "per-acquire" error listener is attached
  2. Automatically clear client error listeners on release to the pool
    • e.g. call client.removeAllListeners('error') just before re-adding the idle listener here: https://github.com/brianc/node-postgres/blob/master/packages/pg-pool/index.js#L329
    • This still leaves a race window between when the idle listener is removed acquiring a client connection and when the calling code can add its custom listener.
    • This race may be avoidable when using the callback style, but it's pretty hard to avoid when using Promises.

mgabeler-lee-6rs avatar Nov 03 '22 03:11 mgabeler-lee-6rs

great question - would you be able to include a code snippet that demonstrates the problem? I'll take a look at it.

brianc avatar Nov 03 '22 18:11 brianc

I've made a small demo here: https://github.com/mgabeler-lee-6rs/node-pg-issue-2852

Naturally imagine that there are many worker functions using the pool in various places.

I've poked through things and found a way to make this kinda work, but it's decidedly not-pretty: use the pool acquire event to remove and re-add a "durable" extra error handler to the client. The downside is that this stays attached to the client, in addition to the "idle" handler, when the connection is released, so this isn't production-ready imo, in addition to being a bit ugly.

mgabeler-lee-6rs avatar Nov 07 '22 18:11 mgabeler-lee-6rs

This is the error log.

error: terminating connection due to idle-in-transaction timeout
    at Parser.parseErrorMessage (/app/node_modules/pg-protocol/src/parser.ts:369:69)
    at Parser.handlePacket (/app/node_modules/pg-protocol/src/parser.ts:188:21)
    at Parser.parse (/app/node_modules/pg-protocol/src/parser.ts:103:30)
    at TLSSocket.<anonymous> (/app/node_modules/pg-protocol/src/index.ts:7:48)
    at TLSSocket.emit (node:events:514:28)
    at addChunk (node:internal/streams/readable:545:12)
    at readableAddChunkPushByteMode (node:internal/streams/readable:495:3)
    at Readable.push (node:internal/streams/readable:375:5)
    at TLSWrap.onStreamRead (node:internal/stream_base_commons:190:23) {
  length: 122,
  severity: 'FATAL',
  code: '25P03',
  detail: undefined,
  hint: undefined,
  position: undefined,
  internalPosition: undefined,
  internalQuery: undefined,
  where: undefined,
  schema: undefined,
  table: undefined,
  column: undefined,
  dataType: undefined,
  constraint: undefined,
  file: 'postgres.c',
  line: '3372',
  routine: 'ProcessInterrupts'
}

Ideally this should not crash the application even if on("error",...) is not set, in this case, the next query or next step of reading should throw a timeout error.

ackava avatar Jan 10 '24 08:01 ackava

Having just had to fight through some more manifestations of this and some related error patterns, I've come to what may be a controversial opinion: The use of the this.emit('error', ...) pattern is just unfriendly to library consumers.

My reason for this is that it creates a library whose behavior includes, "if I notice an unexpected network socket closure in the background, I will kill the entire application unless you do something special on each and every network socket to tell me you don't want this to happen." Phrased like this, I think one can see why this might be a shocking and unpleasant behavior for a networking library to exhibit. Every other networking library, and particularly database connector, I've worked with will simply hold these errors and surface them on the next attempt to use that connection.

Having the option to be proactively notified of background errors is good, so I don't think that should be removed. I simply think that the behavior should be changed such that, if no "user" error listener is installed, node-postgres should retain that error and use it to respond to the next interaction with the client connection object. So e.g. if my PG server goes down unexpectedly, and I don't have an error event listener connected, the "connection terminated by administrator command" or "connection terminated unexpectedly" or whatever error should become the result of the next query() call or the like. And if this happens while a connection is in the pool and not acquired by the library user code, then it should just be silently dropped without fuss in the case where no user error listener is installed[^1].

[^1]: Maybe this part is already the case, I'll admit that tracing through how errors are handled in this library has been confusing for me, and I've ended up adding error listeners in a lot of different places trying to stop my team's apps crashing any time pgbouncer or postgres restarts or some transient networking hiccup happens.

mgabeler-lee-6rs avatar Apr 05 '24 20:04 mgabeler-lee-6rs

@mgabeler-lee-6rs Probably not controversial. I think attaching a no-op error listener to the pool by default has already been proposed, and is the most likely change? Queries erroring with the same connection error is already how it works (#1503).

charmander avatar Apr 06 '24 01:04 charmander

yah this is a really good idea - would probably need to be an "opt-in" feature via some config option until the next major version because even though it's a good change it's also technically a breaking change if someone is relying on the currently unfortunate behavior of "something goes wrong in the background? well unless you have an error listener kiss your process goodbye" behavior. But yeah definitely I'll add this to my ever growing list of "things to work on when I'm back from vacation" aka end of april. ❤️

brianc avatar Apr 06 '24 16:04 brianc

Awesome, thank you!

mgabeler-lee-6rs avatar Apr 08 '24 12:04 mgabeler-lee-6rs