kazoo icon indicating copy to clipboard operation
kazoo copied to clipboard

Lock.acquire throws NoNodeError

Open telaviv opened this issue 6 years ago • 3 comments

We're seeing in our stack traces that a NoNodeError is thrown here https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/recipe/lock.py#L225 quite a bit. I know that it should be impossible for it to happen because of this line here: https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/recipe/lock.py#L216 but that's what we're seeing.

Not sure if this is a client bug or a server bug

telaviv avatar May 29 '19 01:05 telaviv

Hi,

What do you have on your Zookeeper server logs?

StephenSorriaux avatar May 31 '19 10:05 StephenSorriaux

Is it possible this is a result of race conditions?

Because Kazoo doesn't yet (i think) support Container nodes, it's nice to clean up a lock's parent node:

lock.acquire()
lock.release()
client.delete(lock.path) suppressing NotEmptyError exceptions

But I wonder if multiple clients contending for the same lock can cause problems in the case that client1 does path cleanup sometime in the middle of the client2 lock acquisition. My naïve idea is that it would be sometime AFTER client2 has invoked https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/recipe/lock.py#L216 but BEFORE https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/recipe/lock.py#L225

Here's a test where I try to simulate that and got the NoNodeError

    def test_lock_race_conditions_delete_lock_path_during_acquire(self):
        event1 = self.make_event()
        lock1 = self.client.Lock(self.lockpath, "one")
        thread1 = self.make_thread(target=self._thread_lock_acquire_til_event,
                                   args=("one", lock1, event1))
        thread1.start()

        # wait for this thread to acquire the lock
        with self.condition:
            if not self.active_thread:
                self.condition.wait(5)
                eq_(self.active_thread, "one")

        client2 = self._get_client()
        client2.start()

        lock2 = client2.Lock(self.lockpath, "two")
        thread2 = self.make_thread(target=self._thread_lock_acquire_til_event,
                                   args=("two", lock2, self.make_event()))

        # wait until lock1 is released
        event1.set()
        wait = self.make_wait()
        wait(lambda: not lock1.is_acquired)

        # start lock2 acquisition
        thread2.start()
        try:
            # But, delete lock2's parent BEFORE lock2 node is created
            self.client.delete(self.lockpath)
        except NoNodeError:
            # lock2.acquire fails
            pass

        thread1.join()
        thread2.join()
        client2.stop()

could consider catching NoNodeError here and raising ForceRetry? https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/recipe/lock.py#L225

I might be totally off tho.

teeeg avatar May 31 '19 22:05 teeeg

Same race-condition referenced in https://github.com/python-zk/kazoo/issues/329

teeeg avatar Jun 07 '19 18:06 teeeg