asyncpg icon indicating copy to clipboard operation
asyncpg copied to clipboard

Fetching from pgpool-II hangs indefinitely

Open damjankuznar opened this issue 5 years ago • 34 comments

  • asyncpg version: 0.20.1
  • PostgreSQL version: 12.2
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce the issue with a local PostgreSQL install?: No.
  • Python version: Python 3.7.7 (default, Apr 23 2020, 14:43:26)
  • Platform: [GCC 8.3.0] on linux
  • Do you use pgbouncer?: No, I use pgpool-II 4.1.1
  • Did you install asyncpg with pip?: No, with poetry
  • If you built asyncpg locally, which version of Cython did you use?: No
  • Can the issue be reproduced under both asyncio and uvloop?: Yes

When querying the database through pgpool-II it hangs indefinitely, without raising any kind of exception. Connecting to database seems to be working, since for instance invalid password raises an exception:

asyncpg.exceptions.InternalServerError: md5 authentication failed
DETAIL:  password does not match

Steps to reproduce:

import asyncio
import asyncpg

async def run():
    conn = await asyncpg.connect(user='user', password='password',
                                 database='database', host='127.0.0.1') # connect to pgpool-II
    values = await conn.fetch('''SELECT 1''') # hangs here indefinitely
    await conn.close()

loop = asyncio.get_event_loop()
loop.run_until_complete(run())

When connecting to the PostgreSQL directly, the example above works flawlessly.

Log from pgpool-II:

May 13 20:56:00 pg01 pgpool[11982]: 2020-05-13 20:56:00: pid 13189: ERROR:  unable to read data from frontend
May 13 20:56:00 pg01 pgpool[11982]: 2020-05-13 20:56:00: pid 13189: DETAIL:  EOF encountered with frontend

damjankuznar avatar May 13 '20 19:05 damjankuznar

@damjankuznar lol, that's weird. I just came here to report this issue too after realising where the problem probably was: https://github.com/bitnami/charts/issues/2568

deed02392 avatar May 13 '20 19:05 deed02392

Connection poolers have historically been problematic due to various liberties they take with respect to the protocol semantics, especially in the extended query flow (thus the pgbouncer quesion in the issue template).

To quickly test the hypothesis try disabling the prepared statement cache with statement_cache_size=0 in connect().

elprans avatar May 13 '20 20:05 elprans

@elprans using statement_cache_size=0 in connect doesn't change anything. It still hangs indefinitely.

damjankuznar avatar May 13 '20 20:05 damjankuznar

How about await conn.execute('SELECT 1')?

elprans avatar May 13 '20 20:05 elprans

Yes, await conn.execute('SELECT 1') works!

damjankuznar avatar May 13 '20 21:05 damjankuznar

@elprans Will you be fixing the fetch methods or do we need to use execute?

damjankuznar avatar May 14 '20 07:05 damjankuznar

I've been testing a bit with execute and it also hangs indefinitely if arguments are used, e.g.:

conn.execute('''SELECT 1 WHERE 2 > $1''', 1)

Full example:

import asyncio
import asyncpg

async def run():
    conn = await asyncpg.connect(user='user', password='password',
                                 database='database', host='127.0.0.1') # connect to pgpool-II
    values = await conn.execute('''SELECT 1 WHERE 2 > $1''', 1) # hangs here indefinitely
    await conn.close()

loop = asyncio.get_event_loop()
loop.run_until_complete(run())

damjankuznar avatar May 14 '20 07:05 damjankuznar

Yes, it appears the extended query protocol is completely broken with PgPool somehow.

elprans avatar May 14 '20 16:05 elprans

@elprans Is there a plan to solve this problem?

Also, other than pgpool-II, if you know of a high availability and automatic failover product that can be combined with asyncpg, please let me know. Because we can't use cloud databases like Amazon RDS, so we need to build a high availability system in an on-premises environment.

otsuka avatar May 16 '20 09:05 otsuka

Also, other than pgpool-II, if you know of a high availability and automatic failover product that can be combined with asyncpg

You can use pgbouncer, just make sure you've disabled the prepared statement cache with statement_cache_size=0

elprans avatar May 16 '20 17:05 elprans

Hmmm... pgbouncer is not an HA product, it's just a connection pooler.

otsuka avatar May 17 '20 08:05 otsuka

The following is the log of pgpool when it runs await conn.execute('SELECT 1 WHERE 2 > $1', 1) in my environment. Doesn't it include tips on how to solve the problem?

Yes, it appears the extended query protocol is completely broken with PgPool somehow.

Is this an issue on the pgpool side?

May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG:  new connection received
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: DETAIL:  connecting host=192.168.34.1 port=51064
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG:  md5 authentication successful with frontend
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG:  Parse message from frontend.
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: DETAIL:  statement: "__asyncpg_stmt_1__", query: "SELECT 1 WHERE 2 > $1"
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG:  DB node id: 0 backend pid: 22328 statement: SELECT version()
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG:  pool_reuse_block: blockid: 0
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: CONTEXT:  while searching system catalog, When relcache is missed
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG:  DB node id: 0 backend pid: 22328 statement: Parse: SELECT 1 WHERE 2 > $1
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG:  Describe message from frontend.
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: DETAIL:  statement: "__asyncpg_stmt_1__"
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG:  DB node id: 0 backend pid: 22328 statement: D message
May 17 14:26:16  pgpool[22264]: 2020-05-17 14:26:16: pid 22286: LOG:  Flush message from frontend.

(Hangs here)

May 17 14:26:59  pgpool[22264]: 2020-05-17 14:26:59: pid 22286: LOG:  Sync message from frontend.
May 17 14:26:59  pgpool[22264]: 2020-05-17 14:26:59: pid 22286: ERROR:  unable to read data from frontend
May 17 14:26:59  pgpool[22264]: 2020-05-17 14:26:59: pid 22286: DETAIL:  socket read failed with an error "Connection reset by peer"
May 17 14:26:59  pgpool[22264]: 2020-05-17 14:26:59: pid 22277: LOG:  new connection received
May 17 14:26:59  pgpool[22264]: 2020-05-17 14:26:59: pid 22277: DETAIL:  connecting host=192.168.34.1 port=51068

otsuka avatar May 17 '20 08:05 otsuka

Got the exactly same issue for asyncpg / pgpool II. (Tested different PG versions behind pg pool II, including 11.x, 12.x)

But asyncpg is good for stolon project (https://github.com/sorintlab/stolon)

lff0305 avatar May 21 '20 09:05 lff0305

I can confirm the same - Using K8S with PG Pool and bitanmi charts.

vyasraos avatar Sep 12 '20 14:09 vyasraos

So it sounds like this is something that needs to be fixed on the pgpool side? There are no workarounds for asyncpg?

cro avatar Sep 23 '20 17:09 cro

Same issue here - using postgresl-ha bitnami chart (which uses pgpool-II) with asyncpg.

mecampbellsoup avatar Feb 27 '21 20:02 mecampbellsoup

I did a very brief smoke test with aiopg and it works, but have not had any time to switch in our application, so I can't say if it is a drop-in replacement.

cro avatar Feb 27 '21 20:02 cro

Connection poolers have historically been problematic due to various liberties they take with respect to the protocol semantics, especially in the extended query flow (thus the pgbouncer quesion in the issue template).

Has anyone verified whether the issue is on the pgpool side vs. asyncpg side?

mecampbellsoup avatar Mar 01 '21 14:03 mecampbellsoup

It would be great to get some visibility on where the issue lies. @elprans are we correct in assuming asyncpg working with PGPool-II is not something high on the priority list? I ask because there are other libraries that rely on asyncpg, which means they also don't work with pgpool2.

AntonOfTheWoods avatar Mar 09 '21 23:03 AntonOfTheWoods

working with PGPool-II is not something high on the priority list

I don't use PGPool personally, so if anyone needs this sorted out, feel free to investigate and contribute fixes (either here, or to PGPool, which is a more likely culprit here).

elprans avatar Mar 10 '21 00:03 elprans

I did a very brief smoke test with aiopg and it works, but have not had any time to switch in our application, so I can't say if it is a drop-in replacement.

asyncpg has some cool helper functions, like the add_subscriber/remove_subscriber management. aiopg doesn't appear to have anything like that, though it looks trivial to implement for anyone who properly understands async (so not me :-( )

AntonOfTheWoods avatar Mar 10 '21 04:03 AntonOfTheWoods

I have been exchanging with the pgpool devs and it appears we have a fix that will be part of 4.2.3

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=743a69b92d99020e3d1e479ad553a05c68b3b3cd

There may be other issues but this is a great start!

AntonOfTheWoods avatar Mar 18 '21 07:03 AntonOfTheWoods

Seems to be actually solved with the new version of pgpool.

pitwegner avatar May 27 '21 09:05 pitwegner

Seems to be actually solved with the new version of pgpool.

Indeed! "Fix hang with asyncpg. (Tatsuo Ishii)"

https://www.pgpool.net/docs/latest/en/html/release-4-2-3.html

mecampbellsoup avatar May 27 '21 13:05 mecampbellsoup

I still have this issue, even with version 4.2.3 running. :/

jfmrm avatar Jul 13 '21 21:07 jfmrm

@jfmrm can you set up a basic test like the following?

import sys
import asyncio
import asyncpg

async def run(host, port):
    conn = await asyncpg.connect(user='postgres', password='verysecret',
                                 database='all_your_bases', host=host, port=port)
    values = await conn.fetch(
        'SELECT username FROM auth_user WHERE id = $1',
        10,
    )
    await conn.close()
    print(values)

loop = asyncio.get_event_loop()
print(sys.argv)
loop.run_until_complete(run(sys.argv[1], sys.argv[2]))

I am using the standard bitnami helm charts: docker.io/bitnami/pgpool:4.2.2-debian-10-r42 where I can confirm it DOESN'T work (it hangs) and docker.io/bitnami/pgpool:4.2.3-debian-10-r38 where I can confirm it DOES work (it doesn't hang). There may be other edge cases where it still hangs. It would be great to be able to isolate those if you can provide some simple repro code.

AntonOfTheWoods avatar Jul 14 '21 00:07 AntonOfTheWoods

I see this is still Open, though was there ever a confirmation that this issue was resolved in 4.2.3 or in fact still exists?

naaronne avatar Sep 29 '21 15:09 naaronne

@naaronne , were you looking for a confirmation from @damjankuznar ? It works fine for me but I have relatively simple needs in terms of what is offered by postgres. It would probably be appropriate to close and either reopen or create a new issue if anyone finds new problems.

AntonOfTheWoods avatar Sep 30 '21 00:09 AntonOfTheWoods

@jfmrm can you set up a basic test like the following?

import sys
import asyncio
import asyncpg

async def run(host, port):
    conn = await asyncpg.connect(user='postgres', password='verysecret',
                                 database='all_your_bases', host=host, port=port)
    values = await conn.fetch(
        'SELECT username FROM auth_user WHERE id = $1',
        10,
    )
    await conn.close()
    print(values)

loop = asyncio.get_event_loop()
print(sys.argv)
loop.run_until_complete(run(sys.argv[1], sys.argv[2]))

I am using the standard bitnami helm charts: docker.io/bitnami/pgpool:4.2.2-debian-10-r42 where I can confirm it DOESN'T work (it hangs) and docker.io/bitnami/pgpool:4.2.3-debian-10-r38 where I can confirm it DOES work (it doesn't hang). There may be other edge cases where it still hangs. It would be great to be able to isolate those if you can provide some simple repro code.

I ran into same issue with bitnami/pgpool:4.2.5-debian-10-r55

shawn-chiou avatar Dec 16 '21 09:12 shawn-chiou

@shawn-chiou can you confirm you can repro with the above code?

AntonOfTheWoods avatar Dec 16 '21 09:12 AntonOfTheWoods