node-postgres icon indicating copy to clipboard operation
node-postgres copied to clipboard

Horrible query performance since 8.2.0+

Open francois-egner opened this issue 1 year ago • 4 comments

I was trying out drizzle, using a PostgreSQL database (v15). I experience a huge performance issue just querying one row of a table with just one row. So I dig a little deeper inside the code and found out that it internally calls the query method of the Client class. I also tried to call the query method of the Client class with a plain query text/string which resulted in a way faster performance. I was able to reproduce everything using the plain pg module:

const pg = new Client({
   connectionString: configuration.get('DATABASE_URL')
});

pg.connect();
const config = {
  text: 'select count(*) as count from matcher.swipes where id = $1',
  rowMode: 'array'
}
const result = await pg.query(config, ['3a1ec2e5-41b9-4866-a210-f5102f66ca4d'])

This pg.query call takes 40-50ms!

const result = await pg.query("select count(*) as count from matcher.swipes where id = '3a1ec2e5-41b9-4866-a210-f5102f66ca4d';")

This pg.query call takes 2-3ms

By downgrading over and over again I found out that version 8.1.0 is the last version that produces the same timing results for both query calls. Now I wonder what I am missing here. I wonder what I am doing wrong and why it seems to only be me having this problem...

francois-egner avatar Oct 02 '24 21:10 francois-egner

Hi there,

I believe this issue might be a duplicate of #3098. It seems to cover similar concerns.

Dimitris-Tzilopoylos avatar Oct 04 '24 13:10 Dimitris-Tzilopoylos

(8.1.0 → 8.2.0 is when the entire protocol message parsing/serialization was rewritten into pg-protocol.)

This probably isn’t related to the issue, but note that Client#connect() is asynchronous; to avoid involving the query queue, make sure to await pg.connect().

charmander avatar Oct 04 '24 23:10 charmander

(8.1.0 → 8.2.0 is when the entire protocol message parsing/serialization was rewritten into pg-protocol.)

This probably isn’t related to the issue, but note that Client#connect() is asynchronous; to avoid involving the query queue, make sure to await pg.connect().

Thanks for the hint. I verified that this isnt the cause for my problem. But indeed I should await this for sure :)

francois-egner avatar Oct 05 '24 07:10 francois-egner

Hi there,

I believe this issue might be a duplicate of #3098. It seems to cover similar concerns.

Very interesting. Using Node 16 also fixes the issue with the latest node-progres (8.13.0). Does downgrading to 8.1.0 solve your other problem?

francois-egner avatar Oct 05 '24 07:10 francois-egner

Having the same problem, downgrading to 8.1.0 solves it. I'm using bun, so it's not node specific.

This is taking ~50ms on localhost, so I guess it's 100x slowe than it should

await client.connect();
const start = performance.now();

await client.query({
    text: 'SELECT $1',
    values: [1],
});

const end = performance.now();

thedadow451 avatar Oct 29 '24 22:10 thedadow451

@charmander @brianc I went down the rabbit hole of debugging the problem on network and postgres message protocol level. I also did a bisect and confirmed regression was introduced with new protocol 520bd3531990f32c3e00b20020c67f6ac6c70261.

I managed to create repro using tcp protocol. (works both on postgres 9 and postgres 17, so I don't believe it's a new regression in Postgres).

Here's the repro (https://gist.github.com/thedadow451/7c184281253a826c7da374c4f0ce46b7).

It seems like it all comes down to batching (it's not about network overhead since it's localhost), but Postgres takes ~40ms to respond is all messages (parse, bind, execute, sync) are not send as part of single message.

#edit Managed to fix it locally, I can open PR if you are interested.

thedadow451 avatar Oct 31 '24 12:10 thedadow451

@thedadow451 A PR would be great if you have it, yep! Thanks for investigating.

charmander avatar Oct 31 '24 23:10 charmander

Sure, I opened the PR @charmander, let me know if there is anything missing!

There seems to be 1000x difference with and without the fix. :exploding_head: I noticed it as it was affecting almost all of queries in drizzle-orm (my CI tests are taking x20 longer without this fix).

https://github.com/brianc/node-postgres/issues/3340

thedadow451 avatar Nov 01 '24 00:11 thedadow451

@thedadow451 Thank you very much for actually doing the digging and providing the potential fix! I am still wondering why its only a few people having this issue and why the current version is fast with Nodejs 16. Can you guess why after your investigation? With the current information I am not able to make a good guess here.

Thanks again @thedadow451 for the digging :)

francois-egner avatar Nov 03 '24 12:11 francois-egner

@francois-egner

I wanted to verify node16 issue since I was using bun. Funny thing, I'm no longer able to reproduce the issue with either node16, node18 or bun.

I did some changes recently to my system like switching from podman to rootless docker, upgrading psql.

Please feel free to test patch locally (if you still have problem). It's not fixing the root cause, but could be good enough to workaround.

thedadow451 avatar Nov 03 '24 12:11 thedadow451

@thedadow451 I can confirm that your patch fixes the issue. From 50-60ms down to 5ms. Node16 still works fine with the latest version (8.13.1). Node18 brings up bad query times again, which the patch fixes too.

francois-egner avatar Nov 03 '24 15:11 francois-egner

I’m a little surprised that multiple small synchronous writes to a net.Socket stream would translate to multiple TCP packets (as some comments above suggest happens), even with TCP_NODELAY set (which it is), but maybe that’s what changed after Node 16? (Still need to look into it.) What’s more confusing is that pg 8.1.0 also appears to call write multiple times in the same situation (_send call with no more), so I don’t know why downgrading pg would solve it.

Node 16.20.2 also sends multiple TCP packets for multiple writes, so I don’t know why downgrading to Node 16 is helping either, but of course excessive write calls would be a good thing to fix anyway.

Update: pg 8.1.0 is older than I realized, and doesn’t set TCP_NODELAY, so that explains one part.

charmander avatar Jan 18 '25 02:01 charmander