postgres icon indicating copy to clipboard operation
postgres copied to clipboard

Cursor queries get stuck when connection pool is full

Open Megous opened this issue 2 years ago • 2 comments

And I even have a test case, this time :)

const sql = postgres({
        idle_timeout: 30,
        connect_timeout: 5,
        max_lifetime: 600,
        max: 2,
});

async function test(i) {
        let r = sql`SELECT * FROM generate_series(1, 10)`.cursor(2);

        for await (let x of r) {
          x;
        }
        console.log(i + ' done');
}

async function main() {
        try {
                for (let i = 0; i < 200; i++) {
                        test(i).catch(console.log);
                }

                await new Promise(() => {});
        } finally {
                await sql.end();
        }
}

main().catch(console.error);

I'd expect 200 done messages, but I only get ~3.

If I put logging into the asyncIterator, like this: https://megous.com/dl/tmp/p.patch

I get quite a funky output:

...
1 [ 769880, 918098750 ] next res 1 2
1 [ 769880, 918189339 ] next 1
1 [ 769880, 918317737 ] next res 1 2
1 [ 769880, 918417656 ] next 1
2 [ 769880, 918517955 ] next res 2 2
2 [ 769880, 918609464 ] next 2
1 [ 769880, 918739212 ] next res 1 2
1 [ 769880, 918794341 ] next 1
1 [ 769880, 918904900 ] next res 1 2
1 [ 769880, 918959899 ] next 1
2 [ 769880, 919032578 ] next res 2 2
2 [ 769880, 919077378 ] next 2
1 [ 769880, 919192626 ] next res 1 2
1 [ 769880, 919241145 ] next 1
1 [ 769880, 919358104 ] done 1
0 done
2 [ 769880, 919498092 ] next res 2 2
2 [ 769880, 919542482 ] next 2
3 [ 769880, 920528279 ] next res 3 2
3 [ 769880, 920583428 ] next res 3 4
3 [ 769880, 920684947 ] next res 3 6
3 [ 769880, 920735916 ] next res 3 8
3 [ 769880, 920775176 ] next res 3 10
3 [ 769880, 920812025 ] next res 3 12
3 [ 769880, 920847005 ] next res 3 14
3 [ 769880, 920884384 ] next res 3 16
3 [ 769880, 920920884 ] next res 3 18
3 [ 769880, 920955673 ] next res 3 20
3 [ 769880, 920990923 ] next res 3 22
3 [ 769880, 921027232 ] next res 3 24
...

Looks like result sets after filling the pool start to not have the cursor batch size, but grow in size and this.cursorFn gets called multiple times without a corresponding next() call when this happens. Due to promise being already settled once, this doesn't get out, and the iterator gets stuck.

Megous avatar Jun 30 '22 07:06 Megous

Non-cursor queries work fine.

ghost avatar Jun 30 '22 07:06 ghost

Thanks a lot for that case @megous - can confirm that doesn't behave as expected! I'm unfortunately quite hung up this week, so not sure I can find time to dig into it until next week. Maybe some evening before - I'm quite curious about this one

porsager avatar Jul 03 '22 21:07 porsager

👍 I am using numerous concurrent queries and I see similar behaviour. postgres 3.3.2, Postgres 13, node 18.

mattbishop avatar Dec 01 '22 19:12 mattbishop

Same problem here.

MalxMalx avatar Dec 19 '22 15:12 MalxMalx

@porsager So I looked at this in more detail, and the issue seems to lie in Connection not being able to handle pipelining when using cursor queries. When I set max_pipeline: 0, the testcase will start passing.

ghost avatar Feb 05 '23 16:02 ghost

Ths symptom is that when PortalSuspended() runs, result contain a mix of results from different queries after one of the first N queries successfully completes (pipelining seems to be only triggered from queries that did not cause creation of a new Connection).

So you see results like:

PortalSuspended Result(4) [
  { generate_series: 1 },
  { generate_series: 2 },
  { generate_series: 1 },
  { generate_series: 2 }
]

ghost avatar Feb 05 '23 17:02 ghost

Interleaving of results can be fixed by:

https://megous.com/dl/tmp/fix1.patch

diff --git a/src/connection.js b/src/connection.js
index ca3c8cc81aa6..85dce8cb350e 100644
--- a/src/connection.js
+++ b/src/connection.js
@@ -792,12 +794,15 @@ function Connection(options, queues = {}, { onopen = noop, onend = noop, onclose
   }
 
   async function PortalSuspended() {
+    let res = result;
+    result = new Result();
+    rows = 0
+
     try {
-      const x = await Promise.resolve(query.cursorFn(result))
-      rows = 0
+      const x = await Promise.resolve(query.cursorFn(res))
       x === CLOSE
         ? write(Close(query.portal))
-        : (result = new Result(), write(Execute('', query.cursorRows)))
+        : write(Execute('', query.cursorRows, query))
     } catch (err) {
       write(Sync)
       query.reject(err)

I suppose more data rows can arrive via connection for other pipelined queries while cursorFn() runs, so results/rows need to be reset immediately.

This doesn't fix a wrong data being passed to a wrong query when pipelining the cursor queries, since Connection() is not written with assumptiuon that pipelined query results may interleave. (When you pipeline "cursor" query, you get result batches for each query interleaved from the server) Connection() assumes queries finish nicely one after the other in the order they were issued.

So I guess, disabling pipelining when using cursor queries is the easy workaround for this issue (after applying the above patch).

ghost avatar Feb 05 '23 17:02 ghost

Hmm, looking at protocol docs, there's no interleaving... I guess pipelining multiple queries with unnamed portal just causes the protal to be immediately replaced by the subsequent query after the first batch of result from the previous query, and that's what I see in the server responses:

  • send 3 queries without waiting for the results (queries generate 10 results), we request 4 results
  • receive first 4 results for the first query
  • receive first 4 results for the second query
  • receive first 4 results for the third query
  • receive next 4 results for the third query
  • receive last 2 results for the third query

So the solution may simple be to block further pipelining if the current query is a cursor one.

ghost avatar Feb 05 '23 18:02 ghost

A complete fix for the issue https://megous.com/dl/tmp/0001-Don-t-allow-pipelining-on-top-of-cursor-query-fixes-.patch

Feel free to apply it to your repo.

ghost avatar Feb 05 '23 18:02 ghost

I tested it quite a bit in the last month, and it seems to work reliably with the above patch applied.

ghost avatar Mar 08 '23 21:03 ghost

Sorry for the silence @megous , haven't had time to engage much.. Sounds great about the patch, I've applied it and added some tests, it looks good! I'll push and do a release for it ;)

porsager avatar Mar 08 '23 21:03 porsager

Perfect, thank you. :)

ghost avatar Mar 20 '23 10:03 ghost