copycat
copycat copied to clipboard
Transition to CLOSED state does not complete recovery future
We are running into a problem when running under heavy load. Tracing the logs indicate that the following events happen:
- CopycatClient emits onStateChange with new state SUSPENDED, triggering a recovery in our code
- CopycatClient then emits onStateChange with new state DISCONNECTED
- the recovery initiated by the first event does not complete; the thread waiting on the completion of the recovery hangs
Unfortunately I can't make a small reproducible case, due to the intermittent nature of the problem - it can take an hour for it to occur, which suggests a race condition.
Note that we are using copycat-1.1.4, which is the current release.
An update about further problems; We implemented a simple workaround which would cancel pending recoveries when this event happened, and proceed to reconnect the client.
This works for a while, but eventually a new terminal state appears on the client: It successfully returns from the reconnect, but never transitions to CONNECTED state.
We implemented a simple workaround which would cancel pending recoveries when this event happened, and proceed to reconnect the client.
By default copycat has automatic client (session) recovery enabled. Any particular reason why you are doing manual client recovery?
The historical reason why we are not using the automated recovery is that it would generate an IllegalStateException and fail to properly shutdown when the client was disconnected by calling close.
Not sure if this problem has been addressed in more recent releases.
IllegalStateException
on calling close
feels like a bug to me. It also looks like you are running with the latest copycat release. So this definitely is something we need to address.
Do you happen to have a stack trace for that error?
It's been a while since we resolved to use custom recovery for clean shutdowns, so I don't have any stack traces handy. What I recall was happening is that the recovery strategy would be invoked on explicit close, and try to recover the closed session resulting in the exception.