node-postgres
node-postgres copied to clipboard
Extended Query: Support Batch Execution
Hi!
The Extended Query protocol enables drivers to submit multiple BIND messages before SYNC. One of the big benefits of using Extended Queries is that you can efficiently send a batch of executions without waiting for round trips for each execution. Pairing that with prepared statements and some simplifications: you send a single PARSE, a bunch of BIND/EXECUTE and a SYNC to find out how things went.
In other words, you'd be able to support something like the following without needing 4 entire round trips. (I'm not recommending this API since it would be a terrible breaking change.)
await client.query({
name: 'my_query',
text: 'insert sometable (id, val) values ($1, $2)'
values: [
[ 1, 'asdf' ],
[ 2, 'fdsa' ],
[ 3, 'qwer' ],
[ 4, 'uytr' ]
]
})
For more information, check out how the JDBC Postgres driver handles a batched execution. There are a few layers to dig through, but this appears to be the core of the code that sends a batch of messages and subsequently sends a single SYNC. NOTE: their driver imposes a limit of 128 records per batch as (apparently) further batching does not improve performance.
Isn't it already possible in this case? https://github.com/brianc/node-postgres/issues/1190#issuecomment-619557934
While the two approaches look similar, there are very different performance characteristics. Sending a large number of SQL statements with different parameters will perform much worse than sending a single prepared statement and binding many parameters to it.
yeah a proper "batched query" would be nice. Probably a separate object you pass to client.query
or pool.query
that was like
const batch = new BatchQuery({
name: 'optional',
text: 'INSERT INTO foo (bar) VALUES ($1)',
values: [
['first'],
['second']
]
})
const result = client.query(batch)
for (const res of result) {
for (const row of res) {
}
}
Then the batch query execution could throw if this is false for some validation:
for (const row of config.values) {
if (!Array.isArray(config.values)) {
throw new Error('Batch commands require each set of values to be an array. e.g. values: any[][]')
}
}
something like that. Then it would be explicit.
@brianc is this supported?
is this supported?
from a protocol perspective, yes. But I haven't actually implemented the code yet.
Hi @brianc , I am interesting contributing since I believe this would be helpful for my usecase in production. Any guidance would be appreciated.
That'd be cool! I'd suggest making this a separate module like pg-cursor
or pg-query-stream
. It's fine to inline it into this repo as another module here, but best to keep it out of core of pg
to keep bloat to a minimum. So, w/ that in mind we can look at pg-cursor
to see how to do something like this...
Basically anything passed to client.query
will be sniffed to see if it has a submit
function. If it does, that function is called, passing in the connection
object. From that point forward it can fully take over the underlying connection object (which is basically low level functions to send/receieve postgres packets directly) and do anything it wants. Once it emits end
it'll need to clean up after itself. It's not the worlds most well-designed API, particularly from my current skill level, but it is what I came up w/ many years ago and in the interest of backwards compatibility it is what is there today.
https://github.com/brianc/node-postgres/blob/master/packages/pg-cursor/index.js#L42
This is a good point of reference.
It would be great if this feature was implemented with support for multiple queries, rather than multiple value arrays only.
const batch = new BatchQuery({
name: 'optional',
queries: [
['INSERT INTO foo (bar) VALUES ($1)', ['first']],
['DELETE FROM foo WHERE id = $1', [1]],
],
})
@aleclarson That would be pipelining, not multiple bind.
for pipelining, see the experiment in https://github.com/brianc/node-postgres/pull/2706
Hey @brianc @mpareja I have put up a PR. Can you please have a look at the PR and provide feedback
Remarkably, the bench.ts file shows that inserts are getting 100% increase in query per second.