node-postgres
node-postgres copied to clipboard
"Client has encountered a connection error and is not queryable" without further info
Since upgrading from pg 7.4.1 to pg 7.7.1, we are getting "Client has encountered a connection error and is not queryable" error in production after a week of working just fine, but no additional information is being logged that could help us determine what the underlying issue is.
I believe this may have been caused by the changes in https://github.com/brianc/node-postgres/commit/3828aa86081491282c118884c245b4a19619f02e.
Expected behavior: also log the error reason.
The reason will have been emitted as an 'error'
event on the client (and on the pool as well, if you’re using a pool).
hello I have a similar issue: I got a Pool that is not queryable and any error logged. I cant catch this error regardless of what I do. got pg 7.7.1 and postgresql 9.6 I think these versions are incompatible. I will try to install postgresql 11
I'm having this problem too. In my case I am using version 7.7.1 with typeorm
@charmander Ok thanks! Seems I just missed it in our application logs since it was logged several hours before the "Client has encountered a connection error and is not queryable" message in each case.
Maybe it would make sense to include a copy of the reason along with the "Client has encountered a connection error and is not queryable" message, just for easier debugging?
For the record, we're using TypeORM and a pool.
Anyway, from our side, this issue can be closed. Thanks for the quick help!
I also see this error with plain pg
lib (without using TypeORM or any other orm).
as @charmander said, an 'error' event is emmited.
pool.on('error',...)
but it is emmited only once, while the Client has encountered a connection error and is not queryable
is printed everytime I use the defectuous pool
in my case, the emmited error was
Connection terminated unexpectedly Error: Connection terminated unexpectedly
in my case, this happened because of a network reset. In some other situations, I had the impression that pool was resilient to connection loss, but I guess it's not always the case.
@rvion I'm currently seeing a similar issue, the error "Connection terminated unexpectedly" has come up, afterwards the connection was not queryable anymore ("Client has encountered a connection error and is not queryable".
May I ask how you approached this problem?
How about adding a helper method for auto-reconnecting whenever this happens?
How about adding a helper method for auto-reconnecting whenever this happens?
Any update on this?
How about adding a helper method for auto-reconnecting whenever this happens?
Any update on this?
Any update on this?
I have this same issue. Should we create a new Client (or Pool) object on error or what?
Apparently that's exactly what we should do. From https://github.com/brianc/node-postgres/issues/1611#issuecomment-558566873 (see post for more info) --
db = new pg.Client();
db.on('error', e => {
console.error('Database error', e);
db = null;
});
Starting with the above code snippet, here's what I've come up with:
// db.js
let dbClient;
export const getDbClient = () => {
// TODO: How do I know dbClient is any good? <---
return dbClient;
}
const reconnectToDb = async () => {
dbClient = null;
dbClient = await createDbClient();
}
const createDbClient = async () => {
const client = new Client({
connectionString: "..."
});
client.on('error', err => {
console.error('Postgres error:', err.stack);
reconnectToDb();
});
await client.connect();
console.log('Successfully connected to Postgres');
return client;
};
(async () => {
await reconnectToDb();
})()
// Doing Postgres queries
const result = await getDbClient().query("...")
It works great in most ways -- my server doesn't crash, and after there's an error querying Postgres the client automatically reconnects -- but ideally I would know that my next query isn't going to work and create a new DB connection (see the TODO toward the top of this post).
EDIT: Ideally I could do await dbClient.ping()
or something and then create a new DB connection if the ping
fails.
It would be convenient to know what note is causing troubles. I think I am running into the issue due to 2 "big" attachments of 108MB and 130MB.
I see the failures in the logs of the DB:
2023-01-08 22:36:37.311 UTC [10223] LOG: checkpoint complete: wrote 31 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.012 s, sync=0.006 s, total=0.028 s; sync files=30, longest=0.004 s, average=0.001 s; distance=130 kB, estimate=130 kB
2023-01-08 22:36:37.321 UTC [1] LOG: database system is ready to accept connections
at that point we are ok then the sync starts.
2023-01-08 22:36:54.482 UTC [1] LOG: server process (PID 10228) was terminated by signal 9: Killed
2023-01-08 22:36:54.482 UTC [1] DETAIL: Failed process was running: update "items" set "content" = $1 where "id" = $2 returning "id"
2023-01-08 22:36:54.482 UTC [1] LOG: terminating any other active server processes
2023-01-08 22:36:54.483 UTC [1] LOG: all server processes terminated; reinitializing
2023-01-08 22:36:54.561 UTC [10229] LOG: database system was interrupted; last known up at 2023-01-08 22:36:37 UTC
2023-01-08 22:36:54.561 UTC [10232] FATAL: the database system is in recovery mode
2023-01-08 22:36:56.421 UTC [10229] LOG: database system was not properly shut down; automatic recovery in progress
2023-01-08 22:36:56.431 UTC [10229] LOG: redo starts at 1/18B3EBF8
2023-01-08 22:36:56.450 UTC [10229] LOG: invalid record length at 1/18B5F940: wanted 24, got 0
2023-01-08 22:36:56.450 UTC [10229] LOG: redo done at 1/18B5F8F8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s
2023-01-08 22:36:56.455 UTC [10230] LOG: checkpoint starting: end-of-recovery immediate wait
2023-01-08 22:36:56.482 UTC [10230] LOG: checkpoint complete: wrote 31 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.014 s, sync=0.006 s, total=0.028 s; sync files=30, longest=0.004 s, average=0.001 s; distance=131 kB, estimate=131 kB
2023-01-08 22:36:56.491 UTC [1] LOG: database system is ready to accept connections
I see no error in the server logs but the resource ID can be seen there as the PUT fails:
10.244.0.1 - - [08/Jan/2023:22:40:51 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
10.244.0.1 - - [08/Jan/2023:22:41:03 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
10.244.0.1 - - [08/Jan/2023:22:41:19 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
10.244.0.1 - - [08/Jan/2023:22:41:36 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
I forgot to mention that I ran into this issue with attachments over 100MB whereas my initial limit was set to 100MB (so totally expected issues). After seeing this, I bumped the limit to 200MB. My 108MB file went through but it chokes on the 130MB.
Spotting the resource ID in the logs, I could delete the faulty attachement using the "tools/note attachments" and the issue is solved. I could reproduce the issue by adding the attachment again to the note: a 130MB audio recording in MP3 format. That seems to fail 100% of the time.