crystal-db icon indicating copy to clipboard operation
crystal-db copied to clipboard

Connection Pool does not check if connections have become disconnected

Open robcole opened this issue 2 years ago • 0 comments

Sockets (TCP and Unix) are often disconnected by a firewall, docker, or system settings on various hosts (fly.io and render.com were the two ones I tested; fly.io uses TCP sockets and render uses Unix ones). Because those disconnects were server-side, the client wasn't aware that the connections in the pool had become disconnected, and those connections will be treated as open sockets/connections.

This is partially discussed in #109 and probably a few other places.

Here are steps to reproduce this error in a Postgre app (specifically killing individual connections). I was using Lucky+Avram, but happy to help reproduce this on another set of tools if necessary.

In a low-traffic production app on fly.io, I was seeing this error multiple times daily, and it would periodically force a server restart due to all connections being in a "disconnected" state with no ability to recover.


Replication:

Connect an app in dev, have a database setup with a maximum idle pool of 1, initial pool size of 1, max pool size 1. Kill the connection directly in SQL (PG example): SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid <> pg_backend_pid() AND state = 'idle' LIMIT 1; -- this kills a random idle connection, so if you have more current connections, you may need to run it a few times.

Wait a minute after executing that SQL command, then try to visit an endpoint in your app that would trigger a database request. Assuming you've killed all the connections, you should see a DB::ConnectionLost error be raised (true in PG, at the very least). This will especially be true if you have a relatively low connection timeout for the retry attempt.

robcole avatar May 26 '22 16:05 robcole