commons
commons copied to clipboard
Unexpected DistributedLockImpl behavior?
I was trying to use com.twitter.common.zookeeper.DistributedLockImpl and discovered that, after a successful lock acquisition, a disconnect from the server or a session timeout would never release the lock (the internal holdsLock will always stay true). Those events are only detected at lock acquisition time. Is this the expected behavior?
And, while we are on this issue, why does the lock convert a ZooKeeper exception into a runtime/unchecked exception in cleanup() - https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/zookeeper/DistributedLockImpl.java#L206 ? Makes it complicated to properly unlock if a disconnection happens.
Another unexpected thing. tryLock(long, TimeUnit)
does not actually timeout as expected if no ZK server is found and instead seems to block forever. I believe this is because of prepare()
which calls ZooKeeperUtils.ensurePath()
. ensurePath()
can block forever because it calls ZooKeeperClient.get()
.
Sorry - just noticed this issue. There seem to be a few bugs here: 1.) tryLock(long, TimeUnit) clocking forever is clearly one bug. 2.) cleanup() / the ublic methods should throw documented unchecked exceptions at the very least
The 1st issue you mention can't be fully right. A session timeout implies the lock ephemeral node dies which release the lock. A disconnect is expected to happen and should not affect holdig of the lock, only session expiry or explict release should relinquish the lock.
About the first issue, lets say that a client creates a lock on a particular path. It manages to grab the lock but is disconnected at some future point in time. The ephemeral node gets destroyed and some other client can therefore grab the lock. However, the first client still assumes it has it and the lock will not get any notification.
With regards to the other bugs, I also found a few more issues. Would you be willing to accept a patch like https://github.com/ntolia/commons/commit/89280c50413f3b92ea142cf17e1c917dcbb119a3 ? I have a similar patch working in a private repository but cannot seem to get pants to work for me. After fixing some python incompatibility issues, it can't seem to find the ' spy#memcached;2.4.2' dependency.
Bump.
Just wanted to ACK that this is on the radar. Expect some useful feedback by end of business week at the worst.
Hi - just got around looking at this. Thanks for your input.
For the issue that you're reporting - would you mind updating the unit test to reflect the issue you are seeing? That way, I'll be able to verify that your patch works.
Thanks! Florian
Florian, can you give me an idea of the unit tests you are looking for? Most of my patch has to do with correctness and cleanups. The only thing I can think of right now is a test that checks that tryLock(long timeout, TimeUnit unit) does not block forever if no server is present.