stormpot failOver API

If we're pooling network connections, and a sysadmin wants to change the topology, e.g. to do a graceful fail-over, then he'd like a way to close and reopen all the connections one at a time, so the application doesn't experience any disruption.

More here: http://www.slideshare.net/Grypyrg/java-my-sql-connector-connection-pool-features-optimization

Nov 06 '14 10:11 chrisvest

This feature was inspired by https://github.com/brettwooldridge/HikariCP/issues/181 and might need a rethink to optimally support the failover use case.

With refreshAll(), one could build an Allocator that is put in a special mode that blocks all allocations until the failover is completed. Then refreshAll() is called, invalidating all objects. Then the failover procedure happens. The Allocator can then be unblocked, and all the objects in the pool can then be reallocated. One issue with this approach, is that there no place where we wait for claimed objects to return to the pool, during the whole procedure. This way, old connections can persist throughout the failover, and effectively cause a split-brain condition for the application until they are released.

Dec 27 '14 12:12 chrisvest

I think we'd rather want a method called failOver that takes an Allocator factory of sorts. This method would then set the target size to 0, wait for all the objects to be deallocated, and then use the factory to produce and install a new Allocator instance, and then reset the target size. The factory won't be called until all the objects have been deallocated, so if we for instance are using this to switch primary in a database cluster, then the Allocator factory can wait for replication to complete (or for the cluster to otherwise become stable) before returning a new allocator.

May 23 '15 09:05 chrisvest

Moving this to 3.0 because the factory object given to the proposed failOver API above, might as well use the Supplier interface from Java8.

May 23 '15 09:05 chrisvest

It needs to be possible to do this kind of service via JMX, so these operations needs to be available on the ManagedPool interface.

The steps of a manual fail-over are as follows:

Pause the pool, and reduce its size to zero.
Reconfigure the allocator (using methods external to the pool).
Resume the pool.

Two things are required for the above: 1) it must be possible to set the target size to zero, and 2) it must be possible to wait for the pool to reach its target size. With those two, manual fail-over will be possible.

For automated fail-over, it might additionally be desirable to have the allocator replacement process described in the prior comments.

Jun 03 '20 08:06 chrisvest