kazoo
kazoo copied to clipboard
Lock.acquire throws NoNodeError
We're seeing in our stack traces that a NoNodeError is thrown here https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/recipe/lock.py#L225 quite a bit. I know that it should be impossible for it to happen because of this line here: https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/recipe/lock.py#L216 but that's what we're seeing.
Not sure if this is a client bug or a server bug
Hi,
What do you have on your Zookeeper server logs?
Is it possible this is a result of race conditions?
Because Kazoo doesn't yet (i think) support Container nodes, it's nice to clean up a lock's parent node:
lock.acquire()
lock.release()
client.delete(lock.path) suppressing NotEmptyError exceptions
But I wonder if multiple clients contending for the same lock can cause problems in the case that client1 does path cleanup sometime in the middle of the client2 lock acquisition. My naïve idea is that it would be sometime AFTER client2 has invoked
https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/recipe/lock.py#L216
but BEFORE
https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/recipe/lock.py#L225
Here's a test where I try to simulate that and got the NoNodeError
def test_lock_race_conditions_delete_lock_path_during_acquire(self):
event1 = self.make_event()
lock1 = self.client.Lock(self.lockpath, "one")
thread1 = self.make_thread(target=self._thread_lock_acquire_til_event,
args=("one", lock1, event1))
thread1.start()
# wait for this thread to acquire the lock
with self.condition:
if not self.active_thread:
self.condition.wait(5)
eq_(self.active_thread, "one")
client2 = self._get_client()
client2.start()
lock2 = client2.Lock(self.lockpath, "two")
thread2 = self.make_thread(target=self._thread_lock_acquire_til_event,
args=("two", lock2, self.make_event()))
# wait until lock1 is released
event1.set()
wait = self.make_wait()
wait(lambda: not lock1.is_acquired)
# start lock2 acquisition
thread2.start()
try:
# But, delete lock2's parent BEFORE lock2 node is created
self.client.delete(self.lockpath)
except NoNodeError:
# lock2.acquire fails
pass
thread1.join()
thread2.join()
client2.stop()
could consider catching NoNodeError here and raising ForceRetry?
https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/recipe/lock.py#L225
I might be totally off tho.
Same race-condition referenced in https://github.com/python-zk/kazoo/issues/329