crystal-db
crystal-db copied to clipboard
Connection Pool does not check if connections have become disconnected
Sockets (TCP and Unix) are often disconnected by a firewall, docker, or system settings on various hosts (fly.io and render.com were the two ones I tested; fly.io uses TCP sockets and render uses Unix ones). Because those disconnects were server-side, the client wasn't aware that the connections in the pool had become disconnected, and those connections will be treated as open sockets/connections.
This is partially discussed in #109 and probably a few other places.
Here are steps to reproduce this error in a Postgre app (specifically killing individual connections). I was using Lucky+Avram, but happy to help reproduce this on another set of tools if necessary.
In a low-traffic production app on fly.io, I was seeing this error multiple times daily, and it would periodically force a server restart due to all connections being in a "disconnected" state with no ability to recover.
Replication:
Connect an app in dev, have a database setup with a maximum idle pool of 1, initial pool size of 1, max pool size 1.
Kill the connection directly in SQL (PG example): SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid <> pg_backend_pid() AND state = 'idle' LIMIT 1;
-- this kills a random idle connection, so if you have more current connections, you may need to run it a few times.
Wait a minute after executing that SQL command, then try to visit an endpoint in your app that would trigger a database request. Assuming you've killed all the connections, you should see a DB::ConnectionLost error be raised (true in PG, at the very least). This will especially be true if you have a relatively low connection timeout for the retry attempt.