electric icon indicating copy to clipboard operation
electric copied to clipboard

Improve db connection error visibility

Open magnetised opened this issue 1 year ago • 0 comments

With the move to per-shape storage and consumers, deleting a shape while the snapshot is being produced will result in the associated db connection process exiting.

Our previous strategy of just allowing the entire pool to shutdown with a connection error was hence a problem, so we reverted back to the default exponential backoff, to prevent connection errors from bringing down the entire pool.

This means that we don't get good information about connection errors any more, which could be important.

Some ideas re getting connection failure errors:

From the docs it seems that the connection pool is a regular OTP supervisor and connection processes are its children. See the note in the last paragraph of this function's doc - https://hexdocs.pm/db_connection/DBConnection.html#disconnect_all/3.

So if we handle the connection process exit in ConnectionManager and if it's not fatal we keep the pool running, the latter should just start a new connection to go back to the steady state.

https://hexdocs.pm/db_connection/2.4.1/DBConnection.html#start_link/2-telemetry

A [:db_connection, :connection_error] event is published whenever a connection checkout receives a %DBConnection.ConnectionError{}.

magnetised avatar Aug 22 '24 09:08 magnetised