node-postgres icon indicating copy to clipboard operation
node-postgres copied to clipboard

Extended Query: Support Batch Execution

Open mpareja opened this issue 4 years ago • 10 comments

Hi!

The Extended Query protocol enables drivers to submit multiple BIND messages before SYNC. One of the big benefits of using Extended Queries is that you can efficiently send a batch of executions without waiting for round trips for each execution. Pairing that with prepared statements and some simplifications: you send a single PARSE, a bunch of BIND/EXECUTE and a SYNC to find out how things went.

In other words, you'd be able to support something like the following without needing 4 entire round trips. (I'm not recommending this API since it would be a terrible breaking change.)

await client.query({
  name: 'my_query',
  text: 'insert sometable (id, val) values ($1, $2)'
  values: [
    [ 1, 'asdf' ],
    [ 2, 'fdsa' ],
    [ 3, 'qwer' ],
    [ 4, 'uytr' ]
  ]
})

For more information, check out how the JDBC Postgres driver handles a batched execution. There are a few layers to dig through, but this appears to be the core of the code that sends a batch of messages and subsequently sends a single SYNC. NOTE: their driver imposes a limit of 128 records per batch as (apparently) further batching does not improve performance.

mpareja avatar Jun 26 '20 11:06 mpareja

Isn't it already possible in this case? https://github.com/brianc/node-postgres/issues/1190#issuecomment-619557934

rightaway avatar Sep 08 '21 07:09 rightaway

While the two approaches look similar, there are very different performance characteristics. Sending a large number of SQL statements with different parameters will perform much worse than sending a single prepared statement and binding many parameters to it.

mpareja avatar Sep 08 '21 22:09 mpareja

yeah a proper "batched query" would be nice. Probably a separate object you pass to client.query or pool.query that was like

const batch = new BatchQuery({
  name: 'optional',
  text: 'INSERT INTO foo (bar) VALUES ($1)',
  values: [
    ['first'],
    ['second']
  ]
})
const result = client.query(batch)
for (const res of result) {
  for (const row of res) {
  }
}

Then the batch query execution could throw if this is false for some validation:

for (const row of config.values) {
  if (!Array.isArray(config.values)) {
    throw new Error('Batch commands require each set of values to be an array. e.g. values: any[][]')
  }
}

something like that. Then it would be explicit.

brianc avatar Sep 08 '21 22:09 brianc

@brianc is this supported?

damianobarbati avatar Aug 28 '22 20:08 damianobarbati

is this supported?

from a protocol perspective, yes. But I haven't actually implemented the code yet.

brianc avatar Aug 29 '22 17:08 brianc

Hi @brianc , I am interesting contributing since I believe this would be helpful for my usecase in production. Any guidance would be appreciated.

iamkhush avatar Aug 29 '22 20:08 iamkhush

That'd be cool! I'd suggest making this a separate module like pg-cursor or pg-query-stream. It's fine to inline it into this repo as another module here, but best to keep it out of core of pg to keep bloat to a minimum. So, w/ that in mind we can look at pg-cursor to see how to do something like this...

Basically anything passed to client.query will be sniffed to see if it has a submit function. If it does, that function is called, passing in the connection object. From that point forward it can fully take over the underlying connection object (which is basically low level functions to send/receieve postgres packets directly) and do anything it wants. Once it emits end it'll need to clean up after itself. It's not the worlds most well-designed API, particularly from my current skill level, but it is what I came up w/ many years ago and in the interest of backwards compatibility it is what is there today.

brianc avatar Aug 29 '22 21:08 brianc

https://github.com/brianc/node-postgres/blob/master/packages/pg-cursor/index.js#L42

This is a good point of reference.

brianc avatar Aug 29 '22 21:08 brianc

It would be great if this feature was implemented with support for multiple queries, rather than multiple value arrays only.

const batch = new BatchQuery({
  name: 'optional',
  queries: [
    ['INSERT INTO foo (bar) VALUES ($1)', ['first']],
    ['DELETE FROM foo WHERE id = $1', [1]],
  ],
})

aleclarson avatar Sep 07 '22 03:09 aleclarson

@aleclarson That would be pipelining, not multiple bind.

charmander avatar Sep 07 '22 15:09 charmander

for pipelining, see the experiment in https://github.com/brianc/node-postgres/pull/2706

abenhamdine avatar Nov 07 '22 15:11 abenhamdine

Hey @brianc @mpareja I have put up a PR. Can you please have a look at the PR and provide feedback

Remarkably, the bench.ts file shows that inserts are getting 100% increase in query per second.

iamkhush avatar Mar 30 '23 06:03 iamkhush