node-postgres icon indicating copy to clipboard operation
node-postgres copied to clipboard

"Client has encountered a connection error and is not queryable" without further info

Open emlai opened this issue 5 years ago • 9 comments

Since upgrading from pg 7.4.1 to pg 7.7.1, we are getting "Client has encountered a connection error and is not queryable" error in production after a week of working just fine, but no additional information is being logged that could help us determine what the underlying issue is.

I believe this may have been caused by the changes in https://github.com/brianc/node-postgres/commit/3828aa86081491282c118884c245b4a19619f02e.

Expected behavior: also log the error reason.

emlai avatar Dec 14 '18 09:12 emlai

The reason will have been emitted as an 'error' event on the client (and on the pool as well, if you’re using a pool).

charmander avatar Dec 14 '18 17:12 charmander

hello I have a similar issue: I got a Pool that is not queryable and any error logged. I cant catch this error regardless of what I do. got pg 7.7.1 and postgresql 9.6 I think these versions are incompatible. I will try to install postgresql 11

erwanjb avatar Dec 16 '18 12:12 erwanjb

I'm having this problem too. In my case I am using version 7.7.1 with typeorm

jvbianchi avatar Dec 16 '18 19:12 jvbianchi

@charmander Ok thanks! Seems I just missed it in our application logs since it was logged several hours before the "Client has encountered a connection error and is not queryable" message in each case.

Maybe it would make sense to include a copy of the reason along with the "Client has encountered a connection error and is not queryable" message, just for easier debugging?

For the record, we're using TypeORM and a pool.

Anyway, from our side, this issue can be closed. Thanks for the quick help!

emlai avatar Dec 17 '18 11:12 emlai

I also see this error with plain pg lib (without using TypeORM or any other orm). as @charmander said, an 'error' event is emmited.

pool.on('error',...)

but it is emmited only once, while the Client has encountered a connection error and is not queryable is printed everytime I use the defectuous pool

in my case, the emmited error was

Connection terminated unexpectedly Error: Connection terminated unexpectedly

in my case, this happened because of a network reset. In some other situations, I had the impression that pool was resilient to connection loss, but I guess it's not always the case.

rvion avatar Dec 28 '18 14:12 rvion

@rvion I'm currently seeing a similar issue, the error "Connection terminated unexpectedly" has come up, afterwards the connection was not queryable anymore ("Client has encountered a connection error and is not queryable".

May I ask how you approached this problem?

seriousManual avatar Dec 11 '19 11:12 seriousManual

How about adding a helper method for auto-reconnecting whenever this happens?

Aditya94A avatar Jun 20 '20 12:06 Aditya94A

How about adding a helper method for auto-reconnecting whenever this happens?

Any update on this?

gt-c avatar Dec 19 '21 20:12 gt-c

How about adding a helper method for auto-reconnecting whenever this happens?

Any update on this?

Any update on this?

anonrig avatar Feb 07 '22 20:02 anonrig

I have this same issue. Should we create a new Client (or Pool) object on error or what?

elimisteve avatar Oct 18 '22 04:10 elimisteve

Apparently that's exactly what we should do. From https://github.com/brianc/node-postgres/issues/1611#issuecomment-558566873 (see post for more info) --

db = new pg.Client();
db.on('error', e => {
  console.error('Database error', e);
  db = null;
});

elimisteve avatar Oct 18 '22 05:10 elimisteve

Starting with the above code snippet, here's what I've come up with:

// db.js
let dbClient;

export const getDbClient = () => {
  // TODO: How do I know dbClient is any good?  <---
  return dbClient;
}

const reconnectToDb = async () => {
  dbClient = null;
  dbClient = await createDbClient();
}

const createDbClient = async () => {
  const client = new Client({
    connectionString: "..."
  });

  client.on('error', err => {
    console.error('Postgres error:', err.stack);
    reconnectToDb();
  });

  await client.connect();
  console.log('Successfully connected to Postgres');

  return client;
};

(async () => {
  await reconnectToDb();
})()
// Doing Postgres queries
const result = await getDbClient().query("...")

It works great in most ways -- my server doesn't crash, and after there's an error querying Postgres the client automatically reconnects -- but ideally I would know that my next query isn't going to work and create a new DB connection (see the TODO toward the top of this post).

EDIT: Ideally I could do await dbClient.ping() or something and then create a new DB connection if the ping fails.

elimisteve avatar Oct 18 '22 09:10 elimisteve

It would be convenient to know what note is causing troubles. I think I am running into the issue due to 2 "big" attachments of 108MB and 130MB.

I see the failures in the logs of the DB:

2023-01-08 22:36:37.311 UTC [10223] LOG:  checkpoint complete: wrote 31 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.012 s, sync=0.006 s, total=0.028 s; sync files=30, longest=0.004 s, average=0.001 s; distance=130 kB, estimate=130 kB
2023-01-08 22:36:37.321 UTC [1] LOG:  database system is ready to accept connections

at that point we are ok then the sync starts.

2023-01-08 22:36:54.482 UTC [1] LOG:  server process (PID 10228) was terminated by signal 9: Killed
2023-01-08 22:36:54.482 UTC [1] DETAIL:  Failed process was running: update "items" set "content" = $1 where "id" = $2 returning "id"
2023-01-08 22:36:54.482 UTC [1] LOG:  terminating any other active server processes
2023-01-08 22:36:54.483 UTC [1] LOG:  all server processes terminated; reinitializing
2023-01-08 22:36:54.561 UTC [10229] LOG:  database system was interrupted; last known up at 2023-01-08 22:36:37 UTC
2023-01-08 22:36:54.561 UTC [10232] FATAL:  the database system is in recovery mode
2023-01-08 22:36:56.421 UTC [10229] LOG:  database system was not properly shut down; automatic recovery in progress
2023-01-08 22:36:56.431 UTC [10229] LOG:  redo starts at 1/18B3EBF8
2023-01-08 22:36:56.450 UTC [10229] LOG:  invalid record length at 1/18B5F940: wanted 24, got 0
2023-01-08 22:36:56.450 UTC [10229] LOG:  redo done at 1/18B5F8F8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s
2023-01-08 22:36:56.455 UTC [10230] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-01-08 22:36:56.482 UTC [10230] LOG:  checkpoint complete: wrote 31 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.014 s, sync=0.006 s, total=0.028 s; sync files=30, longest=0.004 s, average=0.001 s; distance=131 kB, estimate=131 kB
2023-01-08 22:36:56.491 UTC [1] LOG:  database system is ready to accept connections

I see no error in the server logs but the resource ID can be seen there as the PUT fails:

10.244.0.1 - - [08/Jan/2023:22:40:51 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
10.244.0.1 - - [08/Jan/2023:22:41:03 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
10.244.0.1 - - [08/Jan/2023:22:41:19 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
10.244.0.1 - - [08/Jan/2023:22:41:36 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"

chevdor avatar Jan 08 '23 22:01 chevdor

I forgot to mention that I ran into this issue with attachments over 100MB whereas my initial limit was set to 100MB (so totally expected issues). After seeing this, I bumped the limit to 200MB. My 108MB file went through but it chokes on the 130MB.

chevdor avatar Jan 08 '23 22:01 chevdor

Spotting the resource ID in the logs, I could delete the faulty attachement using the "tools/note attachments" and the issue is solved. I could reproduce the issue by adding the attachment again to the note: a 130MB audio recording in MP3 format. That seems to fail 100% of the time.

chevdor avatar Jan 08 '23 22:01 chevdor