Fetching from pgpool-II hangs indefinitely
- asyncpg version: 0.20.1
- PostgreSQL version: 12.2
- Do you use a PostgreSQL SaaS? If so, which? Can you reproduce the issue with a local PostgreSQL install?: No.
- Python version: Python 3.7.7 (default, Apr 23 2020, 14:43:26)
- Platform: [GCC 8.3.0] on linux
- Do you use pgbouncer?: No, I use pgpool-II 4.1.1
- Did you install asyncpg with pip?: No, with poetry
- If you built asyncpg locally, which version of Cython did you use?: No
- Can the issue be reproduced under both asyncio and uvloop?: Yes
When querying the database through pgpool-II it hangs indefinitely, without raising any kind of exception. Connecting to database seems to be working, since for instance invalid password raises an exception:
asyncpg.exceptions.InternalServerError: md5 authentication failed
DETAIL: password does not match
Steps to reproduce:
import asyncio
import asyncpg
async def run():
conn = await asyncpg.connect(user='user', password='password',
database='database', host='127.0.0.1') # connect to pgpool-II
values = await conn.fetch('''SELECT 1''') # hangs here indefinitely
await conn.close()
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
When connecting to the PostgreSQL directly, the example above works flawlessly.
Log from pgpool-II:
May 13 20:56:00 pg01 pgpool[11982]: 2020-05-13 20:56:00: pid 13189: ERROR: unable to read data from frontend
May 13 20:56:00 pg01 pgpool[11982]: 2020-05-13 20:56:00: pid 13189: DETAIL: EOF encountered with frontend
@damjankuznar lol, that's weird. I just came here to report this issue too after realising where the problem probably was: https://github.com/bitnami/charts/issues/2568
Connection poolers have historically been problematic due to various liberties they take with respect to the protocol semantics, especially in the extended query flow (thus the pgbouncer quesion in the issue template).
To quickly test the hypothesis try disabling the prepared statement cache with statement_cache_size=0 in connect().
@elprans using statement_cache_size=0 in connect doesn't change anything. It still hangs indefinitely.
How about await conn.execute('SELECT 1')?
Yes, await conn.execute('SELECT 1') works!
@elprans Will you be fixing the fetch methods or do we need to use execute?
I've been testing a bit with execute and it also hangs indefinitely if arguments are used, e.g.:
conn.execute('''SELECT 1 WHERE 2 > $1''', 1)
Full example:
import asyncio
import asyncpg
async def run():
conn = await asyncpg.connect(user='user', password='password',
database='database', host='127.0.0.1') # connect to pgpool-II
values = await conn.execute('''SELECT 1 WHERE 2 > $1''', 1) # hangs here indefinitely
await conn.close()
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
Yes, it appears the extended query protocol is completely broken with PgPool somehow.
@elprans Is there a plan to solve this problem?
Also, other than pgpool-II, if you know of a high availability and automatic failover product that can be combined with asyncpg, please let me know. Because we can't use cloud databases like Amazon RDS, so we need to build a high availability system in an on-premises environment.
Also, other than pgpool-II, if you know of a high availability and automatic failover product that can be combined with asyncpg
You can use pgbouncer, just make sure you've disabled the prepared statement cache with statement_cache_size=0
Hmmm... pgbouncer is not an HA product, it's just a connection pooler.
The following is the log of pgpool when it runs await conn.execute('SELECT 1 WHERE 2 > $1', 1) in my environment.
Doesn't it include tips on how to solve the problem?
Yes, it appears the extended query protocol is completely broken with PgPool somehow.
Is this an issue on the pgpool side?
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG: new connection received
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: DETAIL: connecting host=192.168.34.1 port=51064
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG: md5 authentication successful with frontend
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG: Parse message from frontend.
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: DETAIL: statement: "__asyncpg_stmt_1__", query: "SELECT 1 WHERE 2 > $1"
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG: DB node id: 0 backend pid: 22328 statement: SELECT version()
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG: pool_reuse_block: blockid: 0
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: CONTEXT: while searching system catalog, When relcache is missed
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG: DB node id: 0 backend pid: 22328 statement: Parse: SELECT 1 WHERE 2 > $1
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG: Describe message from frontend.
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: DETAIL: statement: "__asyncpg_stmt_1__"
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG: DB node id: 0 backend pid: 22328 statement: D message
May 17 14:26:16 pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG: Flush message from frontend.
(Hangs here)
May 17 14:26:59 pgpool[22264]: 2020-05-17 14:26:59: pid 22286: LOG: Sync message from frontend.
May 17 14:26:59 pgpool[22264]: 2020-05-17 14:26:59: pid 22286: ERROR: unable to read data from frontend
May 17 14:26:59 pgpool[22264]: 2020-05-17 14:26:59: pid 22286: DETAIL: socket read failed with an error "Connection reset by peer"
May 17 14:26:59 pgpool[22264]: 2020-05-17 14:26:59: pid 22277: LOG: new connection received
May 17 14:26:59 pgpool[22264]: 2020-05-17 14:26:59: pid 22277: DETAIL: connecting host=192.168.34.1 port=51068
Got the exactly same issue for asyncpg / pgpool II. (Tested different PG versions behind pg pool II, including 11.x, 12.x)
But asyncpg is good for stolon project (https://github.com/sorintlab/stolon)
I can confirm the same - Using K8S with PG Pool and bitanmi charts.
So it sounds like this is something that needs to be fixed on the pgpool side? There are no workarounds for asyncpg?
Same issue here - using postgresl-ha bitnami chart (which uses pgpool-II) with asyncpg.
I did a very brief smoke test with aiopg and it works, but have not had any time to switch in our application, so I can't say if it is a drop-in replacement.
Connection poolers have historically been problematic due to various liberties they take with respect to the protocol semantics, especially in the extended query flow (thus the pgbouncer quesion in the issue template).
Has anyone verified whether the issue is on the pgpool side vs. asyncpg side?
It would be great to get some visibility on where the issue lies. @elprans are we correct in assuming asyncpg working with PGPool-II is not something high on the priority list? I ask because there are other libraries that rely on asyncpg, which means they also don't work with pgpool2.
working with PGPool-II is not something high on the priority list
I don't use PGPool personally, so if anyone needs this sorted out, feel free to investigate and contribute fixes (either here, or to PGPool, which is a more likely culprit here).
I did a very brief smoke test with aiopg and it works, but have not had any time to switch in our application, so I can't say if it is a drop-in replacement.
asyncpg has some cool helper functions, like the add_subscriber/remove_subscriber management. aiopg doesn't appear to have anything like that, though it looks trivial to implement for anyone who properly understands async (so not me :-( )
I have been exchanging with the pgpool devs and it appears we have a fix that will be part of 4.2.3
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=743a69b92d99020e3d1e479ad553a05c68b3b3cd
There may be other issues but this is a great start!
Seems to be actually solved with the new version of pgpool.
Seems to be actually solved with the new version of pgpool.
Indeed! "Fix hang with asyncpg. (Tatsuo Ishii)"
https://www.pgpool.net/docs/latest/en/html/release-4-2-3.html
I still have this issue, even with version 4.2.3 running. :/
@jfmrm can you set up a basic test like the following?
import sys
import asyncio
import asyncpg
async def run(host, port):
conn = await asyncpg.connect(user='postgres', password='verysecret',
database='all_your_bases', host=host, port=port)
values = await conn.fetch(
'SELECT username FROM auth_user WHERE id = $1',
10,
)
await conn.close()
print(values)
loop = asyncio.get_event_loop()
print(sys.argv)
loop.run_until_complete(run(sys.argv[1], sys.argv[2]))
I am using the standard bitnami helm charts: docker.io/bitnami/pgpool:4.2.2-debian-10-r42 where I can confirm it DOESN'T work (it hangs) and docker.io/bitnami/pgpool:4.2.3-debian-10-r38 where I can confirm it DOES work (it doesn't hang). There may be other edge cases where it still hangs. It would be great to be able to isolate those if you can provide some simple repro code.
I see this is still Open, though was there ever a confirmation that this issue was resolved in 4.2.3 or in fact still exists?
@naaronne , were you looking for a confirmation from @damjankuznar ? It works fine for me but I have relatively simple needs in terms of what is offered by postgres. It would probably be appropriate to close and either reopen or create a new issue if anyone finds new problems.
@jfmrm can you set up a basic test like the following?
import sys import asyncio import asyncpg async def run(host, port): conn = await asyncpg.connect(user='postgres', password='verysecret', database='all_your_bases', host=host, port=port) values = await conn.fetch( 'SELECT username FROM auth_user WHERE id = $1', 10, ) await conn.close() print(values) loop = asyncio.get_event_loop() print(sys.argv) loop.run_until_complete(run(sys.argv[1], sys.argv[2]))I am using the standard
bitnamihelm charts:docker.io/bitnami/pgpool:4.2.2-debian-10-r42where I can confirm it DOESN'T work (it hangs) anddocker.io/bitnami/pgpool:4.2.3-debian-10-r38where I can confirm it DOES work (it doesn't hang). There may be other edge cases where it still hangs. It would be great to be able to isolate those if you can provide some simple repro code.
I ran into same issue with bitnami/pgpool:4.2.5-debian-10-r55
@shawn-chiou can you confirm you can repro with the above code?