finagle-postgres
finagle-postgres copied to clipboard
Fix prepared statements for parallel queries
Hello!
We were running into a problem where many prepared queries happening in parallel were getting the wrong set of parameters assigned to them.
The unproven theory is that the prepared query name
and portal
values are currently set to ""
potentially causing collisions. I changed all those hardcoded ""
to a generated value called genName
which contains an AtomicInt.incrementAndGet
-- that seems to fix the bug in our set up: finagle-postgres 0.11
via quill-finagle-postgres 3.1.0
However, I wasn't able to replicate the bug in the specs. Perhaps the problem is that the queries that I am performing are fast enough to not cause a race condition? or maybe the test queries are somehow running in a serialized fashion anyways? not sure :( - I'd gladly re-write the test to verify this behavior successfully, and would love if someone is able to guide me in the right direction.
I'd like to add that we are getting several symptoms that all seem to be related with this issue:
- Sometimes we get
ERROR: portal "" cannot be run
when those parallel queries are getting executed - Some of these parallel queries are left idle for a long time, causing us to have to use
pg_terminate_backend
regularly in order to cancel them - In one of our setups we had two sets of queries, each with 10 and 12 different parameters, all running simultaneously. In some cases the query expecting 12 parameters was being sent 10, causing the following error:
bind message supplies 10 parameters, but prepared statement “...” requires 12
Thanks for the contribution @davoclavo.
I would love to see some failing specs. But the changes look safe enough for me. Could you rebase?
@leonmaia
Thanks for picking up this stale pull request. I have already rebased with master.
I agree, I would also love to see failing specs but sadly wasn't able to replicate the problem after attempting a couple of times :( - If there is some peace of mind, we have been running this modified version in production for about 6 months without facing this problem.
or maybe the test queries are somehow running in a serialized fashion anyways?
The integration tests all currently run with a pool size of 1, which might explain the inability to replicate.
@dangerousben I noticed that and already tried setting a larger pool for concurrent connections without luck. Is there any other place I should make an adjustment?
See this particular line in my commit: https://github.com/finagle/finagle-postgres/pull/136/commits/a8d5f78840abd83450dd15120769de8e4a208229#diff-3f8ad9e99aaed7920116b65d875d3736R49
@dangerousben I noticed that and already tried setting a larger pool for concurrent connections without luck. Is there any other place I should make an adjustment?
It looks ok, and quick bit of debugging shows that the statements are not run sequentially. Not sure why it's not reproducing the problem.
It looks ok, and quick bit of debugging shows that the statements are not run sequentially. Not sure why it's not reproducing the problem.
My theory is that the queries I am making are fast enough to not cause this race condition, but I am not entirely sure honestly. I will try to make other attempts to replicate this problem, if you have any ideas I am keen to try them out.