node-postgres icon indicating copy to clipboard operation
node-postgres copied to clipboard

Question: How to properly handle unexpected disconnection / autoreconnect ?

Open abenhamdine opened this issue 8 years ago • 23 comments

After have updated from 5.1 to 6.2.4, I see sometimes (nearly once a day) the following error which crashes our production server :

Error: Connection terminated unexpectedly
2017-06-15T10:03:33.683+02:00 at Object.onceWrapper (events.js:293:19)
2017-06-15T10:03:33.683+02:00Connection terminated unexpectedly
2017-06-15T10:03:33.684+02:00 at Socket.<anonymous> (/home/bas/app_21386476-a451-424c-ad67-870442bbdbe7/node_modules/pg/lib/connection.js:138:10)
2017-06-15T10:03:33.684+02:00 at emitNone (events.js:86:13)
2017-06-15T10:03:33.684+02:00 at emitNone (events.js:91:20)
2017-06-15T10:03:33.684+02:00 at Connection.emit (events.js:188:7)
2017-06-15T10:03:33.684+02:00 at Socket.emit (events.js:188:7)
2017-06-15T10:03:33.685+02:00 at process._tickCallback (internal/process/next_tick.js:104:9)
2017-06-15T10:03:33.685+02:00undefined
2017-06-15T10:03:33.685+02:00 at _combinedTickCallback (internal/process/next_tick.js:80:11)
2017-06-15T10:03:33.685+02:00 at endReadableNT (_stream_readable.js:975:12)

I've seen https://github.com/brianc/node-postgres/pull/1316

AFAIK, the error was not emitted before this patch, as of now, we have an error emitted on the pool.

I'm a bit confused here about what we should do when this error is happening ? Does it mean that we have to reconnect the entire pool ? But how ? Or does it means that the client emitting the error is definitively dead, so we rather drop the operation initiated by the client, and tell the user that his query is dropped. But once again how since the client.query callback seems not called with this error (perhaps somehow related to https://github.com/brianc/node-postgres/issues/1322) ?

abenhamdine avatar Jun 15 '17 08:06 abenhamdine

Hey @abenhamdine - I'm glad to see the patch I put into [email protected] is working!

Before when connections unexpectedly dropped from the backend or a network partition happened node-postgres would silently ignore the closed connection and not always purge the disconnected connections from the pool so your pool could fill up with stale connections under heavy load. Under light load connections are purged on a configurable timeout, so you like wouldn't see the old 'stale connection' issue unless you had sustained heavy load for many days in production or were doing db failovers or the like.

What's happening now is one (or many) client(s) in your pool sits idle and connected to the backend database. A network partition happens or database failure, fail-over, forced disconnection from the backend, etc and the idle client(s) all notice and emit an error event. The pool listens for errors on idle clients. When an idle client has an error the pool destroys the client and removes it from the pool so the next time you request a client from the pool it will automatically create and connect a new client.

It's unlikely the error is being emitted from the pool during an active query (if so, this is a bug) - during query execution a client will delegate error handling to its active query so the query can call it's callback with the error (or reject it's promise, etc).

You'll want to put a pool.on('error', (err) => console.error(e.stack)) listener or something on your pool and handle (or at least log) errors here. You probably don't need to restart your application every time an idle client disconnects in the pool because node-postgres will try to do the right thing & reconnect you next time you need a client but you should at least be aware of it. And since node crashes the process when you don't listen for an error event, by default your process is crashing.

tl; dr -

const pool = new Pool()
pool.on('error', (err) => {
  console.error('An idle client has experienced an error', err.stack)
})

Hope this helps. I've tagged this as a doc candidate so I'll refine the docs here & make an entry in the docs about error handling of pooled clients.

brianc avatar Jun 15 '17 15:06 brianc

Also @abenhamdine I know this isn't really the best place for this kinda conversation but I'll be in Paris in september from the 21st through the 30th if you want to meet up for a coffee or something & chat about node-postgres. If so, email me & we can plan! [email protected] - if not, no worries!

brianc avatar Jun 15 '17 15:06 brianc

Thx you so much for such a complete & precise explanation ! 👍 💃
It's very very precious for not-so-advanced-users like me. I've always thought that the quality of a open-source project depends on the ability of the maintainers/advanced users to provide help & answers to such questions.

The 7.0 release (and its docs !) looks very promising, so good luck !

Of course, i will be glad to meet you in Paris late september 🍷

abenhamdine avatar Jun 15 '17 18:06 abenhamdine

Hi @brianc, I faced this issue in my application. Thank you for your explanation I was able to put in the console.error and know why and when it is happening.

My question is how should I handle it in the pool.on ('error', () => { }) ? I mean to say, how would I reconnect automatically when the connection terminates ? At this point, I'm required to restart the node server manually when the connection is terminated.

Please pardon if it is a silly question, just wanted to know what would be the best way to do this ? Thank you!

vyerneni17 avatar Jul 03 '17 04:07 vyerneni17

@vyerneni17 attempting to belatedly provide an answer for @brianc:

You don't need to do anything in the listener aside logging it to your destination of choice.

  • This error only happens outside of a query
  • The pool will automatically attempt to recreate a connection at your next query

You still want to add a listener:

  • so the error doesn't crash your process
  • you can monitor those instabilities in your connection.

Hope that helps!

pcothenet avatar Mar 30 '18 17:03 pcothenet

I'm encountering problems on this front as well. I'm trying to intercept momentary interruptions (e.g., due to a db restart) which emit this error: terminating connection due to unexpected postmaster exit.

Initially I tried connecting a handler for pool errors on the pool instance directly, but found it was never triggered when restarting my database:

const pool = new pg.Pool({});
pool.on('error', err => console.log(err));

I then tried waiting to attach the error handler until I get the connect event emitted:

const pool = new pg.Pool({});
pool.on('connect', () => {
  pool.on('error', err => console.log(err));
});

Much better luck there. I got the errors logged out. Unfortunately somewhere along the way there is still an error thrown from the connection itself:

[exiting] Fatal exception: error: terminating connection due to unexpected postmaster exit
    at Connection.parseE (/opt/runtime/node_modules/pg/lib/connection.js:553:11)
    at Connection.parseMessage (/opt/runtime/node_modules/pg/lib/connection.js:378:19)
    ...

@brianc or others: Can you advise on a thorough, proper implementation which will avoid unhandled errors in situations like these? Have searched the issues, but haven't had much luck for a comprehensive solution with the new pools.

mshick avatar May 10 '18 19:05 mshick

I'm very sorry to add to the confusion here. It just occurred to me that I might have another pool (aside from the one I was working with above) in place, and indeed that's the pool that was throwing unhandled errors.

The simplest solution appears to work great for my case.

Thanks!

mshick avatar May 10 '18 19:05 mshick

@brianc @pcothenet I'm running my node process with nodemon, I'm not sure if this changes things, but when this error occurs it stops the process.

/app/node_modules/pg/lib/client.js:179
      const error = new Error('Connection terminated unexpectedly')
                    ^
Error: Connection terminated unexpectedly
    at Connection.con.once (/app/node_modules/pg/lib/client.js:179:21)
    at Object.onceWrapper (events.js:313:30)
    at emitNone (events.js:106:13)
    at Connection.emit (events.js:208:7)
    at Socket.<anonymous> (/app/node_modules/pg/lib/connection.js:76:10)
    at emitOne (events.js:116:13)
    at Socket.emit (events.js:211:7)
    at TCP._handle.close [as _onclose] (net.js:557:12)
[nodemon] app crashed - waiting for file changes before starting...

(I've already added the event listener)

CoreyCole avatar Jul 26 '18 19:07 CoreyCole

@CoreyCole How did you add the event listener?

charmander avatar Jul 26 '18 21:07 charmander

@charmander

import { Injectable } from '@nestjs/common';
import { Client, QueryResult, Pool } from 'pg';

import { EnvironmentsService } from '../environments.service';

@Injectable()
export class PostgresRepositoryService {
    private client: Client;
    private pool: Pool;

    constructor(private environment: EnvironmentsService) {
        this.client = new Client({
            host: this.environment.getDatabaseUrl(),
            port: this.environment.getDatabasePort(),
            database: 'db_name',
            user: 'db_user',
            password: this.environment.getDatabasePassword()
        });
        this.client.connect()
            .catch(err => {
                console.error(`[PostgresRepositoryService][constructor]: error connecting to db [${err}]`);
                process.exit(1);
            });
        this.pool = new Pool();
        this.pool.on('error', (err) => {
            console.error('An idle client has experienced an error', err.stack);
        });
    }
// other code

CoreyCole avatar Jul 30 '18 22:07 CoreyCole

@CoreyCole In that code, the pool has an error listener, but the client doesn’t. (Did you mean to create a separate pool and client at all?)

charmander avatar Jul 31 '18 00:07 charmander

@charmander I don't understand what the pool is, honestly. I just added the error listener on a new pool to try to make my nodemon server not stop randomly, as per the advice above. Do I not want both a client and a pool?

CoreyCole avatar Jul 31 '18 22:07 CoreyCole

@CoreyCole You probably just want a pool.

export class PostgresRepositoryService {
    private pool: Pool;

    constructor(private environment: EnvironmentsService) {
        this.pool = new Pool({
            host: this.environment.getDatabaseUrl(),
            port: this.environment.getDatabasePort(),
            database: 'db_name',
            user: 'db_user',
            password: this.environment.getDatabasePassword()
        });
        this.pool.on('error', (err) => {
            console.error('An idle client has experienced an error', err.stack);
        });
    }

There’s some documentation on how to use the pool at https://node-postgres.com/features/pooling.

charmander avatar Aug 01 '18 06:08 charmander

I've been running into this issue today, and what I'm seeing doesn't seem to match what is supposed to be happening.

I create my pool like this:

import log from 'lib/log';
import { Pool } from 'pg';

const yb = new Pool({
  database: process.env.YBDATABASE || 'yugabyte',
  host: process.env.YBHOST || 'localhost',
  password: process.env.YBPASSWORD || 'yugabyte',
  port: process.env.YBPORT ? parseInt(process.env.YBPORT, 10) : 7001,
  user: process.env.YBUSER || 'yugabyte',
});

yb.on('error', err => {
  log.error(err, 'Yugabyte connection error');
});

export default yb;

Then I use it like this:

  const yb = await context.services.yb.connect();
  yb.on('error', err =>
    log.error({ err }, 'Got error from yb client instance'),
  );
  const doQuery = async <R>(
    query: string,
    args?: any[],
  ): Promise<QueryResult<R>> => {
    const t = Date.now();
    for (let attempts = 0; ; attempts++) {
      try {
        const res = await yb.query(query, args);
        if (verbose) {
          log.debug({
            args,
            elapsed: Date.now() - t,
            query,
            rowCount: res.rowCount,
          });
        }
        return res;
      } catch (e) {
        if (/^(57|40|08)...$/.test(e.code) && attempts < 10) {
          // These errors are retryable to some degree, so try again if we get one
          await Bluebird.delay(5000 + attempts * 1000);
          continue;
        }
        if (verbose) {
          log.error(
            {
              code: e.code,
              args,
              elapsed: Date.now() - t,
              query,
            },
            String(e),
          );
        }

        if (e.code) e.message += `\nCode: ${e.code}`;
        e.message += `\nQuery: ${query}\nArgs: ${args}`;
        throw e;
      }
    }
  };

When I kill the backend server, the error printed is like this:

05:40:27.287Z ERROR formative : Got error from yb client instance
    Error: Connection terminated unexpectedly
        at Connection.<anonymous> (/home/ubuntu/app/node_modules/pg/lib/client.js:264:9)
        at Object.onceWrapper (events.js:299:28)
        at Connection.emit (events.js:210:5)
        at Socket.<anonymous> (/home/ubuntu/app/node_modules/pg/lib/connection.js:125:10)
        at Socket.emit (events.js:215:7)
        at endReadableNT (_stream_readable.js:1184:12)
        at processTicksAndRejections (internal/process/task_queues.js:80:21)
05:40:27.330Z ERROR formative : 
    Connection terminated unexpectedly
    Query: INSERT INTO public.answers ( _id,assignments,autograded,"createdAt",deleted,"deletedAt","feedbackCount","formativeId","formativeItemId","gradedAt",owner,points,"possiblePoints","scoreFactor","submissionId",tags,"teacherId","updatedAt" ) VALUES ( $1,NULL,true,$2,NULL,NULL,NULL,$3,$4,NULL,$5,$6,$7,$8,$9,NULL,$10,$11 ) ON CONFLICT( _id ) DO UPDATE SET assignments = excluded.assignments,autograded = excluded.autograded,"createdAt" = excluded."createdAt",deleted = excluded.deleted,"deletedAt" = excluded."deletedAt","feedbackCount" = excluded."feedbackCount","formativeId" = excluded."formativeId","formativeItemId" = excluded."formativeItemId","gradedAt" = excluded."gradedAt",owner = excluded.owner,points = excluded.points,"possiblePoints" = excluded."possiblePoints","scoreFactor" = excluded."scoreFactor","submissionId" = excluded."submissionId",tags = excluded.tags,"teacherId" = excluded."teacherId","updatedAt" = excluded."updatedAt" WHERE answers."updatedAt" < excluded."updatedAt"
    Args: AdsXuKLHJCP5T4p4K,Thu Oct 06 2016 06:09:59 GMT-0700 (Pacific Daylight Time),CHz2WLxvRsDGMX4di,uJLWjY8M2yxfEuT8i,migrated-166965,1,1,1,Rjidf5AwXeydyRXNz,migrated-57579,Mon Feb 13 2017 20:04:53 GMT-0800 (Pacific Standard Time)
    --
    Error: Connection terminated unexpectedly
        at Connection.<anonymous> (/home/ubuntu/app/node_modules/pg/lib/client.js:264:9)
        at Object.onceWrapper (events.js:299:28)
        at Connection.emit (events.js:210:5)
        at Socket.<anonymous> (/home/ubuntu/app/node_modules/pg/lib/connection.js:125:10)
        at Socket.emit (events.js:215:7)
        at endReadableNT (_stream_readable.js:1184:12)
        at processTicksAndRejections (internal/process/task_queues.js:80:21)
error Command failed with exit code 1.

Notice how:

  1. The error is emitted from the client instance (prints Got error from client instance), not the pool, AND it is thrown as an exception from the query.
  2. There is no "code" on the exception or some way to tell it is retryable other than the message, which is not ideal. It would be nice if you tried to put a postgres error code, on there, e.g. 08000 which I can use to realize the nature of the error.

If I don't add an error handler onto the client I checkout from the pool, I get an error like this when I restart the database server, and the process is terminated:

events.js:187
      throw er; // Unhandled 'error' event
      ^

Error: Connection terminated unexpectedly
    at Connection.<anonymous> (/home/ubuntu/app/node_modules/pg/lib/client.js:264:9)
    at Object.onceWrapper (events.js:299:28)
    at Connection.emit (events.js:210:5)
    at Socket.<anonymous> (/home/ubuntu/app/node_modules/pg/lib/connection.js:125:10)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1184:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
Emitted 'error' event on Client instance at:
    at connectedErrorHandler (/home/ubuntu/app/node_modules/pg/lib/client.js:211:10)
    at Connection.<anonymous> (/home/ubuntu/app/node_modules/pg/lib/client.js:281:9)
    at Object.onceWrapper (events.js:299:28)
    [... lines matching original stack trace ...]
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
error Command failed with exit code 1.

This seems counter to some of the information above that said that if a query were in progress, the error emitter would not be called.

dobesv avatar Apr 24 '20 05:04 dobesv

@dobesv similar stack happening here that I just recently noticed in our logs...

Error: Connection terminated unexpectedly
    at Connection.<anonymous> (/usr/src/repos/node_modules/pg/lib/client.js:264:9)
    at Object.onceWrapper (events.js:417:28)
    at Connection.emit (events.js:311:20)
    at Socket.<anonymous> (/usr/src/repos/node_modules/pg/lib/connection.js:76:10)
    at Socket.emit (events.js:323:22)
    at TCP.<anonymous> (net.js:668:12)

We're going to deploy a little more to our error handler to try and get more into our logs.

Similar setup, we are def. listening on the pool for 'error'.

import { Pool as PostgresPool } from 'pg';

// ...

const pool = new PostgresPool(config.data.postgres);
pool.on('error', (err, client) => {
  pgDebug(`POSTGRES POOL ERROR: ${err}`);
  console.warn(`Postgres pool error: ${err}`);
});
pool.on('connect', (client) => {
  pgDebug(`Pool connecting a new client (pool: ${pool.totalCount} clients, ${pool.idleCount} idle, ${pool.waitingCount} waiting)`);
});
pool.on('acquire', client => {
  pgDebug(`Postgres client being checked out (pool: ${pool.totalCount} clients, ${pool.idleCount} idle, ${pool.waitingCount} waiting)`);
});
pool.on('remove', client => {
  pgDebug(`Postgres client checked back in (pool: ${pool.totalCount} clients, ${pool.idleCount} idle, ${pool.waitingCount} waiting)`);
});
pool.connect((err, client, release) => {

jeffwilcox avatar Apr 25 '20 16:04 jeffwilcox

I've spent too much time on this, on typeorm side, but for eveyone who is struggling with this:

If you read the documentation carefully(events section on https://node-postgres.com/api/client), then you'll see that connection errors are emitted on client, not thrown on client.query().Thus, all the SQL errors are thrown, or returned on the callback, but for connection errors you must attach client.on('error'), and if you don't the error will be uncaught and by default the process exits.

If you look at pool.query() you see that they properly handle on('error') AND query callback errors, so if you use client returned from pool.connect you must do the same.

What is weird though, as noted also by @dobesv , if you do attach on('error') then the client.query() calls will start throwing exceptions. This is inconsistent or a bug, not sure.

joras avatar May 04 '20 07:05 joras

Reading the docs again, the on('error') should be only important for idle connections, thus I find it is a bug, and a created a new ticket https://github.com/brianc/node-postgres/issues/2191

joras avatar May 04 '20 08:05 joras

If you look at pool.query() you see that they properly handle on('error') AND query callback errors, so if you use client returned from pool.connect you must do the same.

@joras - Hmm, doesn't the client already get the idleListener attached when its returned from pool.connect? https://github.com/brianc/node-postgres/blob/4b229275cfe41ca17b7d69bd39f91ada0068a5d0/packages/pg-pool/index.js#L46-L59


https://github.com/brianc/node-postgres/blob/4b229275cfe41ca17b7d69bd39f91ada0068a5d0/packages/pg-pool/index.js#L231

aranair avatar Jun 02 '21 19:06 aranair

@aranair The idleListener is attached when the client is idle (i.e. not checked out of the pool), and detached when it’s returned from pool.connect.

charmander avatar Jun 02 '21 19:06 charmander

@aranair The idleListener is attached when the client is idle (i.e. not checked out of the pool), and detached when it’s returned from pool.connect.

@charmander - to clarify, does that mean that, ontop of the pool.on('error',...) we should also manually attach a listener to the clients we pull out from pool.connect via client.on('error', fn())?

Is it correct to assume that ^^ is meant to help with the off-chance that the client gets disconnected after it is returned from pool.connect, and before it runs a query? (since query errors would separately be thrown and handled by promises(?))

aranair avatar Jun 02 '21 20:06 aranair

@aranair Yes, you currently need to attach an error listener manually while the client is checked out. It’s for any connection-level errors (including getting disconnected in various ways) that happen during that time, which could be before a query, during it, after it…

charmander avatar Jun 02 '21 23:06 charmander

@aranair Yes, you currently need to attach an error listener manually while the client is checked out. It’s for any connection-level errors (including getting disconnected in various ways) that happen during that time, which could be before a query, during it, after it…

Got it, thanks for the clarification!

aranair avatar Jun 03 '21 14:06 aranair

Is there any workaround?

alexey-sh avatar Jun 11 '25 22:06 alexey-sh