kazoo
kazoo copied to clipboard
Kazoo locks are not getting released in the zookeeper server
I have created root znode ‘zookeeper’ in the zookeeper server and there is a child ‘master’ which has many sub childs worker-1, worker-2, ..Worker-n. To synchronize the operations, I am using kazoo locks recipe to get lock on the node /zookeeper/master and do some logic (distribute tasks) to the workers. But the problem is when worker-1 is doing some tasks it first acquires the lock and does the task. After completion of its task it releases the lock and other workers can acquire it. When worker-1 has released the lock and immediately after that worker-1 has crashed there is a problem of acquiring the lock on the node /zookeeper/master by other workers. Other workers wait for the lock to be acquired but in this case it’s not happening. It is waiting endlessly trying to acquire it and causing deadlock situation. Can you please let me know if there is a fix for this ?
In the below logs, I gave lock timeout as 10 seconds. But in the 10 seconds other workers are not able to acquire the locks?
Here are the below logs
(11/07/2016 15:20:29 957 Thread-8 ) DEBUG [kazoo.client:_submit@282] - Sending request(xid=1): Exists(path='/zookeeper/master', watcher=None) (11/07/2016 15:20:29 960 Thread-8 ) DEBUG [kazoo.client:read_response@370] - Received response(xid=1): ZnodeStat(czxid=4294967304L, mzxid=4294967304L, ctime=1478516289486L, mtime=1478516289486L, version=0, cversion=53, aversion=0, ephemeralOwner=0, dataLength=4, numChildren=1, pzxid=34359738371L) (11/07/2016 15:20:29 963 Thread-8 ) DEBUG [kazoo.client:submit@282] - Sending request(xid=2): Create(path='/zookeeper/master/fc3eceb1bef3438a92acd85e4c64ac07__lock', data='', acl=[ACL(perms=31, acl_list=['ALL'], id=Id(scheme='world', id='anyone'))], flags=3) (11/07/2016 15:20:29 970 Thread-8 ) DEBUG [kazoo.client:_read_response@370] - Received response(xid=2): u'/zookeeper/master/fc3eceb1bef3438a92acd85e4c64ac07__lock__0000000027' (11/07/2016 15:20:29 973 Thread-8 ) DEBUG [kazoo.client:_submit@282] - Sending request(xid=3): GetChildren(path='/zookeeper/master', watcher=None) (11/07/2016 15:20:29 976 Thread-8 ) DEBUG [kazoo.client:_read_response@370] - Received response(xid=3): [u'worker-0', u'fc3eceb1bef3438a92acd85e4c64ac07__lock__0000000027'] (11/07/2016 15:20:29 977 Thread-8 ) DEBUG [kazoo.client:_submit@282] - Sending request(xid=4): Exists(path=u'/zookeeper/master/worker-0', watcher=<bound method Lock._watch_predecessor of <kazoo.recipe.lock.Lock object at 0x0000000003576DD8>>) (11/07/2016 15:20:29 980 Thread-8 ) DEBUG [kazoo.client:_read_response@370] - Received response(xid=4): ZnodeStat(czxid=17179869227L, mzxid=17179869227L, ctime=1478522581726L, mtime=1478522581726L, version=0, cversion=10, aversion=0, ephemeralOwner=0, dataLength=29, numChildren=2, pzxid=25769803786L) (11/07/2016 15:20:39 982 Thread-8 ) DEBUG [kazoo.client:_submit@282] - Sending request(xid=5): Delete(path=u'/zookeeper/master/fc3eceb1bef3438a92acd85e4c64ac07__lock__0000000027', version=-1) (11/07/2016 15:20:39 997 Thread-8 ) DEBUG [kazoo.client:_read_response@370] - Received response(xid=5): True