kazoo icon indicating copy to clipboard operation
kazoo copied to clipboard

The KazooClient.start is not thread safe.

Open tonyseek opened this issue 9 years ago • 5 comments

It seems the KazooClient.start is not thread safe.

The calling from two concurrent threads or greenlets may cause infinite blocking inside the _safe_close method. The self._connection.stop will wait for the stop of outside-spawned self._connection._connection_routine thread, but it never happened.

Should we remind people care about their concurrent usage of KazooClient.start in the document? Alternatively, we could add a semaphore lock to ensure that KazooClient.start will be thread safe.

tonyseek avatar May 05 '16 15:05 tonyseek

yeah, i had the same problem.

vinsia avatar May 31 '16 03:05 vinsia

Also seeing this, but I'm pretty sure our calls to start() or restart() are serial.

Is it just KazooClient.start() that causes this? What about trying to do things while the connection is SUSPENDED or LOST?

mskucherawy avatar Jun 11 '16 01:06 mskucherawy

I think there's an issue when stop() is called while the connection is SUSPENDED. With BLATHER level logging, I see this:

2016-06-13 15:41:55,390 kazoo WARNING: Connection dropped: socket connection broken
2016-06-13 15:41:55,390 kazoo WARNING: Transition to CONNECTING
2016-06-13 15:41:55,390 kazoo INFO: Zookeeper connection lost
<we call stop() here>
2016-06-13 15:41:56,014 kazoo WARNING: Failed connecting to Zookeeper within the connection retry policy.
2016-06-13 15:41:56,014 kazoo INFO: Zookeeper session lost, state: CLOSED

The program is unresponsive after this point.

mskucherawy avatar Jun 13 '16 23:06 mskucherawy

@mskucherawy Have you implemented some automatic re-connecting mechanism?

tonyseek avatar Jun 19 '16 17:06 tonyseek

@tonyseek: Yes, that turned out to be the case; issuing a reconnect instruction from inside a connection watcher results in a bad time.

mskucherawy avatar Aug 19 '16 20:08 mskucherawy