asyncpg Adds max_consecutive_exceptions to pool

This change might not be for everyone, so I respect the right of the core maintainers to downvote/close.

The backstory here is that I am using asyncpg to connect to an HA postgres cluster. In the event of a failover, the writer (where the asyncpg connection pool is connected to) crashes/fails, and is rebooted. The reader is then promoted to writer. The DB DNS is automatically updated, but this takes time. In addition, the connections in the pool continue operating once the crashed DB recovers, except now that DB is the reader (i.e. SHOW transaction_read_only; -> on). As a result, INSERT/UPDATE/DELETE operations result in (among other various connection errors):

asyncpg.exceptions.ReadOnlySQLTransactionError: cannot execute INSERT in a read-only transaction

Other errors, like timeouts, are also possible offenders. Unfortunately, the only way I've found around this to refresh the connections is to set a low max_queries config param. This is generally ok, but degrades performance due to increased cycles of closing and opening new connections.

With this PR, a configurable max_consecutive_exceptions pool param is introduced. This param is checked against every time a pool executes a connection method (fetch, execute, etc) and itresults in an appropriate exception. It's important to note that this PR only manages the consecutive exception state via context, not direct con = await pool.acquire() (in which case it's up to the user to handle).

Supersedes #249

Feb 02 '18 17:02 egalpin

0.16.0 has Pool.expire_connections() which you can call on the failover event. Would that fix your use case?

Jun 10 '18 20:06 elprans

While I think this could make handling the full recycling of the connection pool much simpler (just a function call now), this unfortunately doesn't provide full functionality.

... you can call [expire_connections] on the failover event

That's just the trouble. There isn't a broadcast of a particular special exception or error that indicates the a failover has occurred. What we see is a series of various connection errors, the same as one might occasionally see otherwise and chalk up as transient network instability. This occurs because the failover event results in a DNS update that is relied upon to allow clients to continue using, for example, my_prod_db.foo.com:5432 even though the underlying DB resource has changed entirely.

Jun 11 '18 17:06 egalpin

I'd be more than happy to update the PR to include usage of expire_connections, though

Jun 11 '18 17:06 egalpin

I'm somewhat reluctant to add such policies to the base pool, as everyone's requirements are different.
I think making Pool.get_new_connection() public would allow implementing this policy in a subclass cleanly.

Jun 11 '18 17:06 elprans

I'm definitely open to a subclass model and can appreciate the reasons for leaving this additional complexity out of the base Pool class. Could you elaborate on your suggestion a bit further? I'm not seeing the function get_new_connection anywhere, but perhaps you're suggesting I write that function? Thanks!

Jun 11 '18 18:06 egalpin

There's the private Pool._get_new_connection(), which we can make public.

Jun 11 '18 18:06 elprans