postgres
postgres copied to clipboard
Cursor queries get stuck when connection pool is full
And I even have a test case, this time :)
const sql = postgres({
idle_timeout: 30,
connect_timeout: 5,
max_lifetime: 600,
max: 2,
});
async function test(i) {
let r = sql`SELECT * FROM generate_series(1, 10)`.cursor(2);
for await (let x of r) {
x;
}
console.log(i + ' done');
}
async function main() {
try {
for (let i = 0; i < 200; i++) {
test(i).catch(console.log);
}
await new Promise(() => {});
} finally {
await sql.end();
}
}
main().catch(console.error);
I'd expect 200 done messages, but I only get ~3.
If I put logging into the asyncIterator, like this: https://megous.com/dl/tmp/p.patch
I get quite a funky output:
...
1 [ 769880, 918098750 ] next res 1 2
1 [ 769880, 918189339 ] next 1
1 [ 769880, 918317737 ] next res 1 2
1 [ 769880, 918417656 ] next 1
2 [ 769880, 918517955 ] next res 2 2
2 [ 769880, 918609464 ] next 2
1 [ 769880, 918739212 ] next res 1 2
1 [ 769880, 918794341 ] next 1
1 [ 769880, 918904900 ] next res 1 2
1 [ 769880, 918959899 ] next 1
2 [ 769880, 919032578 ] next res 2 2
2 [ 769880, 919077378 ] next 2
1 [ 769880, 919192626 ] next res 1 2
1 [ 769880, 919241145 ] next 1
1 [ 769880, 919358104 ] done 1
0 done
2 [ 769880, 919498092 ] next res 2 2
2 [ 769880, 919542482 ] next 2
3 [ 769880, 920528279 ] next res 3 2
3 [ 769880, 920583428 ] next res 3 4
3 [ 769880, 920684947 ] next res 3 6
3 [ 769880, 920735916 ] next res 3 8
3 [ 769880, 920775176 ] next res 3 10
3 [ 769880, 920812025 ] next res 3 12
3 [ 769880, 920847005 ] next res 3 14
3 [ 769880, 920884384 ] next res 3 16
3 [ 769880, 920920884 ] next res 3 18
3 [ 769880, 920955673 ] next res 3 20
3 [ 769880, 920990923 ] next res 3 22
3 [ 769880, 921027232 ] next res 3 24
...
Looks like result sets after filling the pool start to not have the cursor batch size, but grow in size and this.cursorFn gets called multiple times without a corresponding next() call when this happens. Due to promise being already settled once, this doesn't get out, and the iterator gets stuck.
Non-cursor queries work fine.
Thanks a lot for that case @megous - can confirm that doesn't behave as expected! I'm unfortunately quite hung up this week, so not sure I can find time to dig into it until next week. Maybe some evening before - I'm quite curious about this one
👍 I am using numerous concurrent queries and I see similar behaviour. postgres
3.3.2, Postgres 13, node 18.
Same problem here.
@porsager So I looked at this in more detail, and the issue seems to lie in Connection not being able to handle pipelining when using cursor queries. When I set max_pipeline: 0
, the testcase will start passing.
Ths symptom is that when PortalSuspended() runs, result
contain a mix of results from different queries after one of the first N queries successfully completes (pipelining seems to be only triggered from queries that did not cause creation of a new Connection).
So you see results like:
PortalSuspended Result(4) [
{ generate_series: 1 },
{ generate_series: 2 },
{ generate_series: 1 },
{ generate_series: 2 }
]
Interleaving of results can be fixed by:
https://megous.com/dl/tmp/fix1.patch
diff --git a/src/connection.js b/src/connection.js
index ca3c8cc81aa6..85dce8cb350e 100644
--- a/src/connection.js
+++ b/src/connection.js
@@ -792,12 +794,15 @@ function Connection(options, queues = {}, { onopen = noop, onend = noop, onclose
}
async function PortalSuspended() {
+ let res = result;
+ result = new Result();
+ rows = 0
+
try {
- const x = await Promise.resolve(query.cursorFn(result))
- rows = 0
+ const x = await Promise.resolve(query.cursorFn(res))
x === CLOSE
? write(Close(query.portal))
- : (result = new Result(), write(Execute('', query.cursorRows)))
+ : write(Execute('', query.cursorRows, query))
} catch (err) {
write(Sync)
query.reject(err)
I suppose more data rows can arrive via connection for other pipelined queries while cursorFn() runs, so results/rows need to be reset immediately.
This doesn't fix a wrong data being passed to a wrong query when pipelining the cursor queries, since Connection() is not written with assumptiuon that pipelined query results may interleave. (When you pipeline "cursor" query, you get result batches for each query interleaved from the server) Connection() assumes queries finish nicely one after the other in the order they were issued.
So I guess, disabling pipelining when using cursor queries is the easy workaround for this issue (after applying the above patch).
Hmm, looking at protocol docs, there's no interleaving... I guess pipelining multiple queries with unnamed portal just causes the protal to be immediately replaced by the subsequent query after the first batch of result from the previous query, and that's what I see in the server responses:
- send 3 queries without waiting for the results (queries generate 10 results), we request 4 results
- receive first 4 results for the first query
- receive first 4 results for the second query
- receive first 4 results for the third query
- receive next 4 results for the third query
- receive last 2 results for the third query
So the solution may simple be to block further pipelining if the current query is a cursor one.
A complete fix for the issue https://megous.com/dl/tmp/0001-Don-t-allow-pipelining-on-top-of-cursor-query-fixes-.patch
Feel free to apply it to your repo.
I tested it quite a bit in the last month, and it seems to work reliably with the above patch applied.
Sorry for the silence @megous , haven't had time to engage much.. Sounds great about the patch, I've applied it and added some tests, it looks good! I'll push and do a release for it ;)
Perfect, thank you. :)