cartridge-java icon indicating copy to clipboard operation
cartridge-java copied to clipboard

Support different reconnection strategies

Open akudiyar opened this issue 3 years ago • 1 comments

Problem statement

Currently, the connection state machine looks like this:

  1. The actual connection process is initiated by an outgoing request. But, the call to a space by name or any other operation which requires receiving metadata, initiates a request for metadata first.
  2. Before connection: a. Get N host addresses from address provider b. Establish M*N connections, where M is the number of connections per host c. If all connections are established successfully, perform the target request and go to 2 d. If any of the connection attempts fail, return an error. But any subsequent request may complete successfully since the other connections in a pool may be established. The connection failure listener may enable the connection mode if the failure is caused by the server side, so depending on the connection failure nature, the next state is either 0 or 2.
  3. While connections are active: a. On normal client closing: the client waits (blocking) for all in-flight requests to finish and then closes all the connections. Any subsequent requests using a closed client will return an error. b. If the underlying channel in a connection is closed (may be caused by failure), the connection mode is enabled, and any subsequent request will go to the state 1. Although, other requests will attempt to use the remaining alive connections. If the number of alive connections for a specific host equals M, it will not be re-established, otherwise all current connections to the host will be closed (gracefully, see 2.a) and re-established.

The schema above has the following pitfalls:

  1. The first request which starts the connection establishing may fail if not all connections are established, but the next request may succeed. Either the first request should try to select the next alive connection or all subsequent requests should fail as well until the connections are re-established.
  2. The reconnection behavior is not determined if the default connection failure listener was not triggered (in what cases?)
  3. If all hosts become unavailable (e.g. the Tarantool server restarted), the client may run out of alive connections and if the connection re-establishing fails as well (due to timeout error), there are no other reconnection attempts (the connection mode is not enabled). A scheduled reconnection strategy may help in this case.
  4. A request which falls to a broken connection should be handed over to an alive one (are there any cases when it is not desired?).

Requirements

  • All connection failure cases must be determined and appropriately handled. Need tests for most possible cases.
  • Connection mode must be enabled always if there are no alive connections.
  • Support for different reconnection strategies must be implemented (indefinite, number-of-attempts, time-based?).
  • A strategy for request handover between connections must be implemented (best-effort, number-of-attempts, time-based?)
  • User must have an ability for specifying his/her own connection failure listeners.

akudiyar avatar Dec 02 '20 07:12 akudiyar

Connected to #115

sharonovd avatar Oct 05 '21 09:10 sharonovd