piccolo asyncpg.exceptions.ConnectionDoesNotExistError: connection was closed in the middle of operation

Hi, sometimes I get the above exception. Once it happens, all following DB Queries are raising this exception, so I have to restart the entire web app. The Trace-Back looks something like this:

  File "/app/app.py", line 359, in process_list
    async with DB.transaction():
  File "/piccolo/piccolo/engine/postgres.py", line 173, in __aenter__
    self.connection = await self.engine.get_new_connection()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/piccolo/piccolo/engine/postgres.py", line 389, in get_new_connection
    return await asyncpg.connect(**self.config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/asyncpg/connection.py", line 2092, in connect
    return await connect_utils._connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/asyncpg/connect_utils.py", line 881, in _connect
    return await _connect_addr(
           ^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/asyncpg/connect_utils.py", line 773, in _connect_addr
    return await __connect_addr(params, timeout, True, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/asyncpg/connect_utils.py", line 831, in __connect_addr
    await compat.wait_for(connected, timeout=timeout)
  File "/venv/lib64/python3.11/site-packages/asyncpg/compat.py", line 56, in wait_for
    return await asyncio.wait_for(fut, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/tasks.py", line 479, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
asyncpg.exceptions.ConnectionDoesNotExistError: connection was closed in the middle of operation

Apr 17 '23 14:04 powellnorma

Looks like some other people have encountered similar problems with asyncpg: https://github.com/MagicStack/asyncpg/issues/309

What's interesting about your traceback is you're making a direct connection to Postgres, and not acquiring a connection from a pool. With a pool I can imagine many potential causes (e.g. a connection being killed off for being too old), but as it's a fresh connection it's quite strange.

Is there any chance that your app has exhausted all connections to Postgres? By default it's set to something like 100. You would expect a different exception than ConnectionDoesNotExistError though.

Apr 17 '23 15:04 dantownsend

Is there any chance that your app has exhausted all connections to Postgres?

I don't think so, it just randomly seems to happen every now and then.

Do you think it would be feasible to handle this in this library ? So that in get_new_connection we somehow restart (?) asyncpg, if we get ConnectionDoesNotExistError? Because currently I still need to restart the server manually.

I think adding it to this library would have the benefit that not everyone needs to write his/her own workaround to this bug which seems to possibly affect everyone(?) using the postgres engine, even though seldom.

What do you think?

Jul 06 '23 07:07 powellnorma

Do you think it would be feasible to handle this in this library ? So that in get_new_connection we somehow restart (?) asyncpg, if we get ConnectionDoesNotExistError? Because currently I still need to restart the server manually.

I'm not sure. We could try fetching another connection after a short asyncio.sleep.

Restarting the server will kill all existing connections. I'm not sure if we could do it in a graceful way.

What kind of stack are you running? I read through this thread again: https://github.com/MagicStack/asyncpg/issues/309 and it seems like the following could be true:

uvloop is somehow to blame
The Postgres server is under resourced
Some kind of network middleware is killing connections

Are you running any real long transactions?

I had a quick look through the Sentry logs for some of my apps, and I've seen TimeoutError when trying to get a connection, but not so much of ConnectionDoesNotExistError.

Jul 06 '23 22:07 dantownsend

What kind of stack are you running

I indeed use uvloop. I don't use docker, the Starlette App and Postgres both run "natively" on the system.

In the postgres log it says:

LOG:  could not receive data from client: Connection reset by peer

Are you running any real long transactions?

I don't think so, max should be ~1-2s

What I wonder about.. The ConnectionDoesNotExistError to me seems to indicate that a "connection reuse" is happening on some level. Maybe it is possible to force asyngpg to really use a new connection? Currently get_new_connection uses asyncpg.connect. Maybe the stored connections are available via some private attribute, which we could clear as a workaround?

Jul 07 '23 08:07 powellnorma

I indeed use uvloop. I don't use docker, the Starlette App and Postgres both run "natively" on the system.

So the stack isn't likely to be causing the problem. It could be uvloop ... but kind of hard to test if the error only occurs sporadically.

I don't think so, max should be ~1-2s

Transactions don't sound like the problem either.

The ConnectionDoesNotExistError to me seems to indicate that a "connection reuse" is happening on some level.

Or that the connection is killed part way through an operation.

Maybe the stored connections are available via some private attribute, which we could clear as a workaround?

I would need to look more into the internals of asyncpg.

At what point does the error get raised? Immediately after get_new_connection, or part way through a query?

Jul 07 '23 21:07 dantownsend

What is interesting is that this Exception, once it was raised a single time, seems to be raised for all subsequent queries. In the asyncpg log however the line Connection reset by peer is only shown once.

And here someone was able to solve this problem by changing the keepalive of tcp connections. So this are clues that I think indicate that there could be a faulty/closed connection in some sort of cache - But maybe it has other causes.

Immediately after get_new_connection, or part way through a query?

I think it happens during get_new_connection, when trying to connect to postgres. The first stack trace looks the same as the ones following it

Jul 08 '23 06:07 powellnorma

but kind of hard to test if the error only occurs sporadically

Today I tested:

Cancel Task that does DB Operation (insert) after different delays
Do many inserts concurrently
Do simple DB Operation (insert) after increasing delays (15s to >1h)

Sadly, none of these were able to reproduce this. With 1. I sometimes got asyncio.exceptions.InvalidStateError:, with 2. I got asyncpg.exceptions.TooManyConnectionsError:.

However I realized that my App wasn't using connection pooling at all, since this has to be enabled manually. Since this ConnectionDoesNotExistError seems to have been thrown when creating new connections, I hope that this also reduces the (overall) chance of this bug occurring (since now, way fewer new connections should need to be created)

Jul 08 '23 18:07 powellnorma