Add configurable TTL for every Lock
We're using the LockRegistryLeaderInitiator mechanism to control leader election in the cluster backed with Hazelcast. I see that it uses tryLock(long time, TimeUnit unit) method without specifying the maximum lease time. In fact, it's possible that without graceful shutdown the instance won't release the lock and then no other leader will be elected. All instances then are logging
Acquiring the lock for LockContext{role=my-app-leader, id={{app-uuid}}, isLeader=false}
Without manually forcing unlock the whole cluster is stuck. I know that the java.util.concurrent.locks.Lock doesn't offer locking for a given time, but it's supported for example with the com.hazelcast.core.ILock.tryLock(long time, TimeUnit unit, long leaseTime, TimeUnit leaseUnit) method.
Maybe simple possibility to override LeaderSelector.call lock acquisition part could be extracted so users can override it with and provide max lease time for example? It could be also done for example in the spring-integration-hazelcast module.
One workaround would be to provide a custom implementation of HazelcastLockRegistry that returns a LockWrapper (implements Lock - there are only a handful of methods).
Have all the methods delegate to the underlying Lock except tryLock(time, unit) which can call the extended tryLock() method.
I think I would prefer that approach over changing the leader initiator.
I wouldn't object to making such a change directly in the HazelcastLockRegistry itself (optionally). Feel free to make a contribution.
I missed the option with the custom LockRegistry and it's definitely the simplest one. I will discuss the timings with the team and will try to submit a PR.
I've just looked at the code, and the problem is that LockRegistry returns the java.util.concurrent.locks.Lock object. So to make this walkaround useful we'd need to proxy the ILock object with tryLock method delegated to the one with leaseTime set to some default. It will work but won't be a clean solution. And the only clean solution I see there is to extract custom Lock interface or extend LockRegistry with a tryLock method. WDYT?
Right - that's what I meant by LockWrapper. Something like this...
public class HazelcastLockRegistry implements LockRegistry {
private final HazelcastInstance client;
private final long leaseTime;
private final TimeUnit leaseTimeUnit;
public HazelcastLockRegistry(HazelcastInstance hazelcastInstance) {
this(hazelcastInstance, 0, null);
}
public HazelcastLockRegistry(HazelcastInstance hazelcastInstance, long leaseTime, TimeUnit leaseTimeUnit) {
Assert.notNull(hazelcastInstance, "'hazelcastInstance' must not be null");
this.client = hazelcastInstance;
this.leaseTime = leaseTime;
this.leaseTimeUnit = leaseTimeUnit;
}
@Override
public Lock obtain(Object lockKey) {
Assert.isInstanceOf(String.class, lockKey);
ILock lock = this.client.getLock((String) lockKey);
if (this.leaseTimeUnit == null) {
return lock;
}
else {
return new LockWrapper(lock, this.leaseTime, this.leaseTimeUnit);
}
}
private static final class LockWrapper implements Lock {
private final ILock lock;
private final long leaseTime;
private final TimeUnit leaseTimeUnit;
LockWrapper(ILock lock, long leaseTime, TimeUnit leaseTimeUnit) {
this.lock = lock;
this.leaseTime = leaseTime;
this.leaseTimeUnit = leaseTimeUnit;
}
@Override
public void lock() {
this.lock.lock();
}
@Override
public void lockInterruptibly() throws InterruptedException {
this.lock.lockInterruptibly();
}
@Override
public boolean tryLock() {
return this.lock.tryLock();
}
@Override
public boolean tryLock(long time, TimeUnit unit) throws InterruptedException {
return this.lock.tryLock(time, unit, this.leaseTime, this.leaseTimeUnit);
}
@Override
public void unlock() {
this.lock.unlock();
}
@Override
public Condition newCondition() {
return this.lock.newCondition();
}
}
}
I think it's time to change a LockRegistry.obtain() contract with such a Duration leaseTime. Most of the implementations for distributed locks really provide that functionality to avoid the described situation with dead lock.
I agree that it may require some internal changes for every single implementation, but we need to align with protocol requirements somehow anyway...
Yes, we can start with Hazelcast, but that is already a matter of the Spring Integration Extensions project: https://github.com/spring-projects/spring-integration-extensions/tree/master/spring-integration-hazelcast
obtain() only gets a lock from the registry (based on the key); it doesn't actually acquire the lock.
We'd need an integration lock extension like hazelcast's ILock - although I see that is deprecated now (in 3.12) in favor of FencedLock.
From my perspective LockRegistry.obtain() has a clear responsibility of acquiring the lock. How and for how long can be the responsibility of the underlying infrastructure. I've implemented the solution proposed by @garyrussell and it works like a charm on production.
OK. If you both agree that common private final long leaseTime; value as a configuration option is enough, we can go ahead into that direction everywhere needed.
What I see in the RedisLockRegistry is called expireAfter and used in the Redis script like: redis.call('PEXPIRE', KEYS[1], ARGV[2]).
The JDBC solution provides for us a ttl option on the DefaultLockRepository. Every call to the acquire() initiates a deleteExpired().
The Gemfire one is based on the Cache.getLockLease().
At the same time doesn't look like Zookeeper + Curator Framework provide some option for leasing...
So, I guess we are good in this project for all the implementation. Since the original concern is about a Hazelcast implementation, moving this issue into an appropriate project.
Will continue discussion over there.
Thanks for understanding!
Hm... I don't see any big problem with casting a LockRegistry.obtain() result into the ILock and perform many useful Hazelcast features on it, including desired tryLock(long time, TimeUnit unit, long leaseTime, TimeUnit leaseUnit) and lock(long leaseTime, TimeUnit timeUnit);. With the LockWrapper we need to keep ILock contract for possible casting in the end-user. But if that is the fact, we don't need any additional options, since, essentially, we just going to break ILock contract implementation:
@Override
public boolean tryLock(long timeout, TimeUnit unit) throws InterruptedException {
return tryLock(timeout, unit, Long.MAX_VALUE, null);
}
So, knowing that you deal with Hazelcast, just be sure to cast into ILock and go ahead with leaseTime requirements!
I treat this as Works as Designed and will keep it open only until your feedback.
Thank you for understanding!
Moved back to Spring integration since it looks like we need to provide a DistributedLock contract in other LockRegistry implementations.
Move info from the https://jira.spring.io/browse/INT-4399