postgrest pipeline mode

Currently we send 2 queries for each request, one for the http context(+ role auth + search_path) and another one for the crud operation.

Removing the http context query grants about 33% increase in TPS with plain pgbench tests.

The 2 queries cannot be merged into one because the crud query needs the role + search_path setting beforehand.

I believe libpq pipeline mode could help us gain perf here. With pipeline mode we wouldn't need to wait for the result of the first query(which we don't care) before sending the second.

May 28 '22 17:05 steve-chavez

Here's how psycopg implemented this: https://github.com/psycopg/psycopg/pull/89.

with conn.pipeline() as pipeline:
   cur.execute(...)
   cur2.execute(...)
   pipeline.sync()  # calls PQpipelineSync
   r1 = cur.fetchone()
   r2 = cur2.fetchall()

We might have to pitch this feature on hasql.

Jun 03 '22 22:06 steve-chavez

Supporting pipeline mode would be great!

Aug 11 '22 18:08 wolfgangwalther

Maybe this could also help us with doing an EXPLAIN before the request is executed. Related to https://github.com/PostgREST/postgrest/issues/915#issuecomment-676487851

Aug 17 '22 00:08 steve-chavez

Out of curiosity I pgbenched pipeline mode on https://github.com/PostgREST/postgrest/pull/2672/commits/fa50be2f1a064bba7039d01de913d79f30154c49. I got:

postgrest-with-postgresql-15 -f test/pgbench/fixtures.sql pgbench -M extended -n -T 10 -f test/pgbench/pipeline_mode/old.sql
pgbench (15.1)
transaction type: test/pgbench/pipeline_mode/old.sql
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 30443
number of failed transactions: 0 (0.000%)
latency average = 0.328 ms
initial connection time = 1.232 ms
tps = 3044.579797 (without initial connection time)

postgrest-with-postgresql-15 -f test/pgbench/fixtures.sql pgbench -M extended -n -T 10 -f test/pgbench/pipeline_mode/new.sql
pgbench (15.1)
transaction type: test/pgbench/pipeline_mode/new.sql
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 34782
number of failed transactions: 0 (0.000%)
latency average = 0.287 ms
initial connection time = 1.322 ms
tps = 3478.549246 (without initial connection time)

So an increase in 14% TPS. This was tested on pg + pgrst on the same machine, I assume if postgREST is separated then the gains would be more.

cc @robx

Feb 21 '23 20:02 steve-chavez

Out of curiosity I pgbenched pipeline mode on fa50be2.

Thanks for this, particularly explicitly showing how to get those numbers! I'll wrap up some related experiments with postgrest itself in the loop and post numbers in a bit.

Feb 22 '23 10:02 robx

Ok, here are some bounds on what pipeline mode could conceivably get us. With #2682, I ran postgrest-loadtest while varying:

PostgREST:

vanilla, with setPgLocals query before the regular request query
removing setPgLocals entirely
replacing setPgLocals by a plain SELECT 1

Networking:

adding a delay of 5ms (each way) between postgrest and postgresql, and a delay of 5ms (each way) between the client and postgrest
delay 10ms / 1ms
delay 1ms / 10ms

Then the "request rate" output of postgrest-loadtest is:


delay (PG/pgrst) in ms	5/5	10/1	1/10

setPgLocals	14.65	9.46	25.16
select 1	14.65	9.59	25.27
nothing	17.51	11.76	28.37

ratio nothing / setPgLocals	1.20	1.24	1.13

This ratio is an upper bound on the improvement we could see with pipelining. The results show that the "actual work" done by setPgLocals is negligible -- there's no big difference to just calling SELECT 1.

What this experiment doesn't quite determine is the non-network overhead of a database query. I.e., how much of the difference between "select 1" and "nothing" is due to the communication, and how much is overhead of generating and parsing query and result. Running the same test with more extreme delays for postgresql could help there; I'll aim to do that. In this vein it might be interesting to compare those pgbench results above, replacing the setPgLocals query by select 1 or leaving it out.
It would also be interesting to think of what kind of latency scenario is actually realistic. I imagine that typically postgrest will be close to postgresql (e.g. same data center), while client requests would be potentially from far away.
postgrest-load-test runs over unix sockets, what's the plain overhead of moving to TCP?

(I'm not that confident in my findings here, if someone wants to replicate this (should be straightforward using #2682) that would be great!)

Feb 24 '23 21:02 robx

PostgREST:

vanilla, with setPgLocals query before the regular request query removing setPgLocals entirely replacing setPgLocals by a plain SELECT 1

It might be interesting to get some number which use a db-pre-request function, too - because that's another roundtrip to the database, right?

Feb 26 '23 19:02 wolfgangwalther

PostgREST: vanilla, with setPgLocals query before the regular request query removing setPgLocals entirely replacing setPgLocals by a plain SELECT 1

It might be interesting to get some number which use a db-pre-request function, too - because that's another roundtrip to the database, right?

That's a good point, I missed the fact that there's potentially two statements in setPgLocals.

Though maybe it's better to just go ahead with trying out pipelining -- I came at this thinking that those database roundtrips we save probably don't matter enough to be worth the effort of introducing pipelining. My benchmarking could have proved that the potential benefit is irrelevant, but I don't think it has. (It also doesn't prove we will get those 10-20% improvement, of course.)

Feb 27 '23 12:02 robx

In this vein it might be interesting to compare those pgbench results above, replacing the setPgLocals query by select 1 or leaving it out.

Tried the above. Replaced with select 1:

[nix-shell:~/Projects/postgrest]$ postgrest-with-postgresql-15 -f test/pgbench/fixtures.sql pgbench -M extended -n -T 10 -f test/pgbench/pipeline_mode/old.sql
pgbench (15.1)
transaction type: test/pgbench/pipeline_mode/old.sql
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 33335
number of failed transactions: 0 (0.000%)
latency average = 0.300 ms
initial connection time = 1.294 ms
tps = 3333.880729 (without initial connection time)

[nix-shell:~/Projects/postgrest]$ postgrest-with-postgresql-15 -f test/pgbench/fixtures.sql pgbench -M extended -n -T 10 -f test/pgbench/pipeline_mode/new.sql
pgbench (15.1)
transaction type: test/pgbench/pipeline_mode/new.sql
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 40032
number of failed transactions: 0 (0.000%)
latency average = 0.250 ms
initial connection time = 1.279 ms
tps = 4003.641201 (without initial connection time)

I'm able to reproduce the above numbers with setPgLocals, so SELECT 1 is a bit faster. Likely it's not that noticeable with postgrest-loadtest.

Feb 27 '23 22:02 steve-chavez