pipeline mode
Currently we send 2 queries for each request, one for the http context(+ role auth + search_path) and another one for the crud operation.
Removing the http context query grants about 33% increase in TPS with plain pgbench tests.
The 2 queries cannot be merged into one because the crud query needs the role + search_path setting beforehand.
I believe libpq pipeline mode could help us gain perf here. With pipeline mode we wouldn't need to wait for the result of the first query(which we don't care) before sending the second.
Here's how psycopg implemented this: https://github.com/psycopg/psycopg/pull/89.
with conn.pipeline() as pipeline:
cur.execute(...)
cur2.execute(...)
pipeline.sync() # calls PQpipelineSync
r1 = cur.fetchone()
r2 = cur2.fetchall()
We might have to pitch this feature on hasql.
Supporting pipeline mode would be great!
Maybe this could also help us with doing an EXPLAIN before the request is executed. Related to https://github.com/PostgREST/postgrest/issues/915#issuecomment-676487851
Out of curiosity I pgbenched pipeline mode on https://github.com/PostgREST/postgrest/pull/2672/commits/fa50be2f1a064bba7039d01de913d79f30154c49. I got:
postgrest-with-postgresql-15 -f test/pgbench/fixtures.sql pgbench -M extended -n -T 10 -f test/pgbench/pipeline_mode/old.sql
pgbench (15.1)
transaction type: test/pgbench/pipeline_mode/old.sql
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 30443
number of failed transactions: 0 (0.000%)
latency average = 0.328 ms
initial connection time = 1.232 ms
tps = 3044.579797 (without initial connection time)
postgrest-with-postgresql-15 -f test/pgbench/fixtures.sql pgbench -M extended -n -T 10 -f test/pgbench/pipeline_mode/new.sql
pgbench (15.1)
transaction type: test/pgbench/pipeline_mode/new.sql
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 34782
number of failed transactions: 0 (0.000%)
latency average = 0.287 ms
initial connection time = 1.322 ms
tps = 3478.549246 (without initial connection time)
So an increase in 14% TPS. This was tested on pg + pgrst on the same machine, I assume if postgREST is separated then the gains would be more.
cc @robx
Out of curiosity I pgbenched pipeline mode on fa50be2.
Thanks for this, particularly explicitly showing how to get those numbers! I'll wrap up some related experiments with postgrest itself in the loop and post numbers in a bit.
Ok, here are some bounds on what pipeline mode could conceivably get us. With #2682, I ran postgrest-loadtest while varying:
PostgREST:
- vanilla, with
setPgLocalsquery before the regular request query - removing
setPgLocalsentirely - replacing
setPgLocalsby a plainSELECT 1
Networking:
- adding a delay of 5ms (each way) between postgrest and postgresql, and a delay of 5ms (each way) between the client and postgrest
- delay 10ms / 1ms
- delay 1ms / 10ms
Then the "request rate" output of postgrest-loadtest is:
| delay (PG/pgrst) in ms | 5/5 | 10/1 | 1/10 |
| setPgLocals | 14.65 | 9.46 | 25.16 |
| select 1 | 14.65 | 9.59 | 25.27 |
| nothing | 17.51 | 11.76 | 28.37 |
| ratio nothing / setPgLocals | 1.20 | 1.24 | 1.13 |
This ratio is an upper bound on the improvement we could see with pipelining. The results show that the "actual work" done by setPgLocals is negligible -- there's no big difference to just calling SELECT 1.
- What this experiment doesn't quite determine is the non-network overhead of a database query. I.e., how much of the difference between "select 1" and "nothing" is due to the communication, and how much is overhead of generating and parsing query and result. Running the same test with more extreme delays for postgresql could help there; I'll aim to do that. In this vein it might be interesting to compare those
pgbenchresults above, replacing thesetPgLocalsquery byselect 1or leaving it out. - It would also be interesting to think of what kind of latency scenario is actually realistic. I imagine that typically postgrest will be close to postgresql (e.g. same data center), while client requests would be potentially from far away.
-
postgrest-load-testruns over unix sockets, what's the plain overhead of moving to TCP?
(I'm not that confident in my findings here, if someone wants to replicate this (should be straightforward using #2682) that would be great!)
PostgREST:
vanilla, with setPgLocals query before the regular request query removing setPgLocals entirely replacing setPgLocals by a plain SELECT 1
It might be interesting to get some number which use a db-pre-request function, too - because that's another roundtrip to the database, right?
PostgREST: vanilla, with setPgLocals query before the regular request query removing setPgLocals entirely replacing setPgLocals by a plain SELECT 1
It might be interesting to get some number which use a
db-pre-requestfunction, too - because that's another roundtrip to the database, right?
That's a good point, I missed the fact that there's potentially two statements in setPgLocals.
Though maybe it's better to just go ahead with trying out pipelining -- I came at this thinking that those database roundtrips we save probably don't matter enough to be worth the effort of introducing pipelining. My benchmarking could have proved that the potential benefit is irrelevant, but I don't think it has. (It also doesn't prove we will get those 10-20% improvement, of course.)
In this vein it might be interesting to compare those pgbench results above, replacing the setPgLocals query by select 1 or leaving it out.
Tried the above. Replaced with select 1:
[nix-shell:~/Projects/postgrest]$ postgrest-with-postgresql-15 -f test/pgbench/fixtures.sql pgbench -M extended -n -T 10 -f test/pgbench/pipeline_mode/old.sql
pgbench (15.1)
transaction type: test/pgbench/pipeline_mode/old.sql
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 33335
number of failed transactions: 0 (0.000%)
latency average = 0.300 ms
initial connection time = 1.294 ms
tps = 3333.880729 (without initial connection time)
[nix-shell:~/Projects/postgrest]$ postgrest-with-postgresql-15 -f test/pgbench/fixtures.sql pgbench -M extended -n -T 10 -f test/pgbench/pipeline_mode/new.sql
pgbench (15.1)
transaction type: test/pgbench/pipeline_mode/new.sql
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 40032
number of failed transactions: 0 (0.000%)
latency average = 0.250 ms
initial connection time = 1.279 ms
tps = 4003.641201 (without initial connection time)
I'm able to reproduce the above numbers with setPgLocals, so SELECT 1 is a bit faster. Likely it's not that noticeable with postgrest-loadtest.