commons icon indicating copy to clipboard operation
commons copied to clipboard

Unexpected DistributedLockImpl behavior?

Open ntolia opened this issue 13 years ago • 8 comments

I was trying to use com.twitter.common.zookeeper.DistributedLockImpl and discovered that, after a successful lock acquisition, a disconnect from the server or a session timeout would never release the lock (the internal holdsLock will always stay true). Those events are only detected at lock acquisition time. Is this the expected behavior?

ntolia avatar Aug 05 '11 22:08 ntolia

And, while we are on this issue, why does the lock convert a ZooKeeper exception into a runtime/unchecked exception in cleanup() - https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/zookeeper/DistributedLockImpl.java#L206 ? Makes it complicated to properly unlock if a disconnection happens.

ntolia avatar Aug 05 '11 23:08 ntolia

Another unexpected thing. tryLock(long, TimeUnit) does not actually timeout as expected if no ZK server is found and instead seems to block forever. I believe this is because of prepare() which calls ZooKeeperUtils.ensurePath(). ensurePath() can block forever because it calls ZooKeeperClient.get().

ntolia avatar Aug 06 '11 19:08 ntolia

Sorry - just noticed this issue. There seem to be a few bugs here: 1.) tryLock(long, TimeUnit) clocking forever is clearly one bug. 2.) cleanup() / the ublic methods should throw documented unchecked exceptions at the very least

The 1st issue you mention can't be fully right. A session timeout implies the lock ephemeral node dies which release the lock. A disconnect is expected to happen and should not affect holdig of the lock, only session expiry or explict release should relinquish the lock.

jsirois avatar Aug 16 '11 02:08 jsirois

About the first issue, lets say that a client creates a lock on a particular path. It manages to grab the lock but is disconnected at some future point in time. The ephemeral node gets destroyed and some other client can therefore grab the lock. However, the first client still assumes it has it and the lock will not get any notification.

With regards to the other bugs, I also found a few more issues. Would you be willing to accept a patch like https://github.com/ntolia/commons/commit/89280c50413f3b92ea142cf17e1c917dcbb119a3 ? I have a similar patch working in a private repository but cannot seem to get pants to work for me. After fixing some python incompatibility issues, it can't seem to find the ' spy#memcached;2.4.2' dependency.

ntolia avatar Aug 16 '11 04:08 ntolia

Bump.

ntolia avatar Sep 22 '11 23:09 ntolia

Just wanted to ACK that this is on the radar. Expect some useful feedback by end of business week at the worst.

wfarner avatar Sep 27 '11 06:09 wfarner

Hi - just got around looking at this. Thanks for your input.

For the issue that you're reporting - would you mind updating the unit test to reflect the issue you are seeing? That way, I'll be able to verify that your patch works.

Thanks! Florian

florianleibert avatar Sep 28 '11 17:09 florianleibert

Florian, can you give me an idea of the unit tests you are looking for? Most of my patch has to do with correctness and cleanups. The only thing I can think of right now is a test that checks that tryLock(long timeout, TimeUnit unit) does not block forever if no server is present.

ntolia avatar Oct 21 '11 18:10 ntolia