copycat icon indicating copy to clipboard operation
copycat copied to clipboard

Transition to CLOSED state does not complete recovery future

Open vyzo opened this issue 8 years ago • 6 comments

We are running into a problem when running under heavy load. Tracing the logs indicate that the following events happen:

  • CopycatClient emits onStateChange with new state SUSPENDED, triggering a recovery in our code
  • CopycatClient then emits onStateChange with new state DISCONNECTED
  • the recovery initiated by the first event does not complete; the thread waiting on the completion of the recovery hangs

Unfortunately I can't make a small reproducible case, due to the intermittent nature of the problem - it can take an hour for it to occur, which suggests a race condition.

vyzo avatar Jul 05 '16 09:07 vyzo

Note that we are using copycat-1.1.4, which is the current release.

vyzo avatar Jul 05 '16 09:07 vyzo

An update about further problems; We implemented a simple workaround which would cancel pending recoveries when this event happened, and proceed to reconnect the client.

This works for a while, but eventually a new terminal state appears on the client: It successfully returns from the reconnect, but never transitions to CONNECTED state.

vyzo avatar Jul 05 '16 12:07 vyzo

We implemented a simple workaround which would cancel pending recoveries when this event happened, and proceed to reconnect the client.

By default copycat has automatic client (session) recovery enabled. Any particular reason why you are doing manual client recovery?

madjam avatar Jul 05 '16 17:07 madjam

The historical reason why we are not using the automated recovery is that it would generate an IllegalStateException and fail to properly shutdown when the client was disconnected by calling close.

Not sure if this problem has been addressed in more recent releases.

vyzo avatar Jul 05 '16 17:07 vyzo

IllegalStateException on calling close feels like a bug to me. It also looks like you are running with the latest copycat release. So this definitely is something we need to address.

Do you happen to have a stack trace for that error?

madjam avatar Jul 05 '16 18:07 madjam

It's been a while since we resolved to use custom recovery for clean shutdowns, so I don't have any stack traces handy. What I recall was happening is that the recovery strategy would be invoked on explicit close, and try to recover the closed session resulting in the exception.

vyzo avatar Jul 05 '16 18:07 vyzo