implement pipelining mode allowed by libpq 14
libpq shipped with Postgres 14 now allows using pipelining mode https://www.postgresql.org/docs/14/libpq-pipeline-mode.html
as stated by these docs
While the pipeline API was introduced in PostgreSQL 14, it is a client-side feature which doesn't require special server support and works on any server that supports the v3 extended query protocol.
So I don't know exactly when v3 extended query protocol has appeared (at least 2010 I guess), but it's there since a lot of time, so probably all supported version of pg server (9.6 and above) are able to support pipelining mode.
related commit in posgres repository is there https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=acb7e4eb6b
For the record, a previous PR has been proposed at https://github.com/brianc/node-postgres/pull/662 but it was before before libpq supports it so the PR has been closed because it would have created a discrepancy between native bindings (which use libpq) and the js client modified by the PR.
IMHO this mode should be opt-in, at least for current major version :
- because it's not so clear (to me at least) how some workflows are handled (what if you send a DDL query adding a column and then immediatly a DML query using the new column ?)
- to avoid subtle breaking changes in users usage (especially for old version of pg)
- for memory consumption reason (from the pg docs :
Pipeline mode also generally consumes more memory on both the client and server)
edit : I made an (incomplete) attempt in this PR : https://github.com/brianc/node-postgres/pull/2706
Any update on this?
Any update on this?
I did some tests locally with a fork, built upon https://github.com/brianc/node-postgres/pull/662 and noticed some interesting perf gains in my uses cases (between 5% and 20%). I will send a PR as a POC. However it adress only the js driver, not the native one.
Explicit pipelining would be interesting.
E.g. ioredis supports client.pipeline([['get', 'foo'], ['get', 'bar']]).exec().
pg could do the same to ensure they are sent on the same connection without any other slow query interfering.
e.g. const [[errA, resA], [errB, resB]] = await pool.pipeline([{text, values}, {text, values}]).
Or
const [{err: errB, rows: rowsA} {err: errA, rows: rowsB}] = await pool.pipeline([{text, values}, {text, values}])
Explicit pipelining would be interesting.
I don't get your point. If pipelining is possible, why would you want to get it explicitely and would not want not always use it ?
I can understand to put it behind a flag (eg --pipelining at connection level) until we are sure it's safe to use it for every use case, but I fail to see why it's should be explicit in the api.
With transactions it will get executed to a single client, so that will work as desired.
But when using a pool, pipeline would get distributed among all the clients. In some cases you might want to execute it on the same client instance so other queries won't be scheduled in between. But maybe that's not worth it to implement.
But when using a pool, pipeline would get distributed among all the clients
~~Indeed, but it will be the case with or without pipelining, thus why would it be an issue ? Or perhaps I miss something in your explanation ?~~
EDIT : oh sorry, it looks like I missed the important part of your post :
so other queries won't be scheduled in between
Yes it's part of my thoughts (also with mixing DML/DDL queries)
Perhaps indeed it should be allowed to optin (or optout ?) at the pool and/or client level
But when using a pool, pipeline would get distributed among all the clients. In some cases you might want to execute it on the same client instance so other queries won't be scheduled in between. But maybe that's not worth it to implement.
I'm not sure how the implementation in node would look like but this should not be possible using libpq. I'm investigating this feature, and (I might be wrong) but what I found is this is just allowing batch dispatch transparently from client side. The pipeline essentially makes the network file descriptor non-blocking and allow the client submit multiple requests in a non-blocking way. All the requests are going through the same SOCK_FD and the client needs now to monitor the associated SOCK_FD (select, poll, epoll, io_uring, kqueue, etc...).
Not sure if the complexity it adds to the client side pays off. There are cases it can improve performance reducing/saving network trip time because the lib allows the client to push the requests to the lib and then call sync (sort of flush) and it will send all to the server at the same time, having only 1 RTT instead of multiples. But the SOCK_FD used will be the same.
Another alternative is to call sync after every request (query). The calls will be asynchronous and won't block but they will again go through the same SOCK_FD and the response, even though asynchronous, will be sequential. Let's say we have 5 queries A, B, C, D and E. The 1st takes 1 second, the 2nd 30 seconds and all the subsequent ones take 1 second. We expect to receive all the responses, except the query B within 1~2 seconds time, but this won't happen because they will be returned sequentially. It will be A=1s, B=31s, C=32s, etc...
I just started investigating this feature on libpq and I might have missed something or not got to the nice features yet, but as I said, ATM the client implementation costs is quite high for the benefits it gets. It needs to handle the SOCK_FD but except from the RTT savings, I can't see much more.
I'm not sure how the implementation in node would look like but this should not be possible using libpq. I'm investigating this feature, and (I might be wrong) but what I found is this is just allowing batch dispatch transparently from client side. The pipeline essentially makes the network file descriptor non-blocking and allow the client submit multiple requests in a non-blocking way. All the requests are going through the same SOCK_FD and the client needs now to monitor the associated SOCK_FD (select, poll, epoll, io_uring, kqueue, etc...).
Yes pipelining consists essentially in reusing the same socket without waiting for previous requests reponse.
Not sure if the complexity it adds to the client side pays off. There are cases it can improve performance reducing/saving network trip time because the lib allows the client to push the requests to the lib and then call sync (sort of flush) and it will send all to the server at the same time, having only 1 RTT instead of multiples. But the SOCK_FD used will be the same.
Saving network trips can provide significant gain perfs, see comments and benckmark https://github.com/brianc/node-postgres/pull/2706 (nodejs implementation but the saving of network trips would be the same I assume)
Saving network tripes can provide significant gain perfs, see comments and benckmark https://github.com/brianc/node-postgres/pull/2706 (nodejs implementation but the saving of network trips would be the same I assume)
I have no doubt that when in high latency network the gains could be significant but was skeptical on how much gain we could get otherwise. I'm still concerned of the complexity it may add to some client implementations but after reading the points (and benchmark) in the cases you referred I'm starting to rethink it. Thanks for sharing!
Is the postgresql server executing the queries in parallel or one at a time when using a pipeline? When not using a connection pool, but a single connection, pipelining would be really useful. The best way would be to use it like in redis, but the response should be similar to Promise.allSettled.
const [{status, value}, {status, value}] = await client.pipeline().query(sql1).query(sql2).exec();
With pipelining you can reduce the idle time between queries on a single connection, by buffering the next query ahead of time.
The server will not execute multiple queries in parallel on the same connection, but whenever it finishes with one query, it won't have to wait a full network roundtrip for the next. This becomes significant if the query execution times are comparable to the network latency.
Would be nice to run multiple queries in parallel with just one connection. Forking in the backend probably would be faster than client opening multiple connections.
Would be nice to run multiple queries in parallel with just one connection. Forking in the backend probably would be faster than client opening multiple connections.
pipelining has nothing to do with query execution concurrency/parallelism. With or without pipelining mode, the PostgreSQL server is executing the queries sequentially (while using parallelism capabilities if enabled), pipelining just allows both sides of the connection to work concurrently when possible, and to minimize round-trip time.
@abenhamdine I said would be nice. And is only possible with pipelining.
Any news?
No unfortunately, I don't have time to finish the PR (and I didn't even look at the native driver)