spring-integration
spring-integration copied to clipboard
Custom TTL per LOCK in LockRegistry
Current default (and only) implementation of LockRepository based on JDBC (DefaultLockRepository) allows to set static TTL for all locks, controlled by this repository.
But not all locks are equal. Some locks require bigger TTL than others. It would be good to support custom TTL per lock.
From DB point of view, I think this is easy task. Currently DB stores CREATED_DATE field. If new EXPIRATION_DATE field will be added to the table, it would be easy to expire records based on this date instead of date of creation + TTL.
From repository point of view it is not so easy. It requires to change the interface or create a new interface that extends current LockRepository and adds new method:
boolean acquire(String lock, Duration ttl)
From registry point of view it is not so easy too. The only method of LockRegistry allows to obtain a lock only using lockKey (Lock obtain(Object lockKey)). But to support custom TTL per Lock we need to add another method, like:
Lock obtain(Object lockKey, Duration ttl)
that would set an explicit TTL on JdbcLock. And JdbcLock then will pass this TTL into it's LockRepository from doLock() method.
So, for JDBC implementation it looks totally possible to implement.
But I didn't investigate other implementations of LockRepository and LockRegistry (not based on JDBC). I'm not familiar with Redis or Zookeeper, so it would be good if someone first evaluate the idea of this feature request.
Context
Not all business processes are the same. Some usually finish in 2 secs. Other in 20 mins. So, having a static TTL for all locks of repository is impractical. Setting it to intentionally big value is impractical too, as this means that fast business process, that is killed before lock release, will not be restarted until this big TTL ends.
The workaround to this problem is to use multiple instances of LockRepository (and wrapping LockRegistry) with multiple different INT_LOCK tables. In this case each instance of LockRepository can be configured with explicit TTL. This is only partial workaround as it's hard to predict all possible TTLs for each possible business process. But it cloud be generalized to, say, 3 tables (with TTL=10 secs, 10 mins and 10 hours).
The other workaround, I think, is to periodically call lock() methods again on currently held lock. As I see in the code, it should update the CREATED_DATE field in the table to current datetime, so expiration date will be prolonged too. I actually think, this is the best approach, as it allows to not set too long TTL and at the same time to not mistakenly acquire the lock, that other process still holds.
But it's not always easy to implement this in business logic. For example, if some external service is called with big timeout (and if timeout must be big for this service for some reason) and this external service usually responds in 2 secs, but sometimes in 2 mins. In this case, calling thread is blocked until external service respond or timeout is reached, so no lock() method could be called again in this timeframe.
See more info here: https://github.com/spring-projects/spring-integration/issues/3095.
We probably really talk about the same matter - lease, TTL - should lend at the same functionality in the target lock implementation.
Not sure if we can do it for the current 5.5, but definitely worth to revise in 6.0 even if it is going to be a big breaking change.
WDYT?
@artembilan Thanks. Yes, I think we are on the same page! I call it TTL here because it is named TTL in the code of current implementation of org.springframework.integration.jdbc.lock.DefaultLockRepository.
/**
* Specify the time (in milliseconds) to expire dead locks.
* @param timeToLive the time to expire dead locks.
*/
public void setTimeToLive(int timeToLive) {
this.ttl = timeToLive;
}
What I noticed is that in the mentioned issue you decide to introduce a new method tryLock with support of TTL (lease) into the Lock instance. I'm not sure how would you do it, as JDK's Lock doesn't support it. This will require client-code to cast Lock to something else and that's imply that client-code know what implementation of Lock it uses.
My suggestion here is to modify (or, more specifically, overload) LockRegistry.obtain() method. When you obtain the lock with specified TTL, this TTL will be just remembered inside the Lock implementation. And it will be applied only after you call some variant of lock() method on the given Lock instance.
public Lock obtain(Object lockKey, Duration ttl) {
String path = pathFor((String) lockKey);
return this.locks.computeIfAbsent(path, (key) -> new JdbcLock(this.client, this.idleBetweenTries, key, ttl)); // note new TTL argument here
}
And because TTL is not absolute instant of time, but relative value, it should work even if you trying to lock() long time after you called obtain(key, ttl).
I also discovered #3272 just now and see that my second "workaround" is already implemented in RenewableLockRegistry (in v5.4, while I use 5.3 so wasn't aware). Cool! :)
By the way, if this feature will be implemented, it would be better to change "renew" implementation to update new EXPIRATION_DATE field instead of CREATED_DATE. CREATED_DATE must never be changed after Lock creation then.
Actually, this not only desirable, but required, because new "expiration query" will have to delete expired locks based on EXPIRATION_DATE field instead of CREATED_DATE. Because 2 locks could be created at the same time but with different expiration dates.
Looks, like CREATED_DATE field will be redundant then. Could only be used for manual overview (how long this Lock is alive).
I think we applied the renewal logic on the CREATED_DATE to avoid breaking changes in the minor release.
For version 6.0 we definitely can consider to to have a new EXPIRATION_DATE column with all the mentioned lease or TTL logic.
My point against obtain(Object lockKey, Duration ttl) is because different instances may try to use different values for ttl when it is still the same lock instance. The ttl is really not an object property, but rather state. Therefore my point about new tryLock() API when any instance may decide how long it would like to keep the lock. In that linked issue I suggest to introduce a DistributedLock contract which is going to be returned from existing obtain() API. However not all LockRegistry are about distributed locks or just cannot implement DistributedLock contract (ZookeeperLockRegistry ?). So, we probably just need to introduce a generic argument on the LockRegistry to let the target impl to dictate us what the Lock extension we are going to deal with. This way a new tryLock(ttl) API will make it into end-user code for any peculiar decisions.
Does it make sense?
Yes, I think it does make sense.
You are talking about the situation when obtained Lock instance is shared between some bean instances/threads (of the same application instance) etc. And each bean/thread may want to set different TTL when trying to acquire that lock.
I just didn't thought about it this way. In my mind the relationship between obtain() and lock() methods was 1-to-1, which is not true. In reality it's 1-to-many. You can call obtain() one time and then call lock() multiple times on the same Lock (or DistributedLock) instance (potentially, with different TTL). There is also caching of obtained Lock instances in the registry implementations. So, you are right.
But then, it looks like renew() method should be part of DistributedLock contract too instead of be part of LockRegistry contract. What do you think?
Yes, that makes sense. Thanks.
We have renewal contract on the registry at the moment because of to big breaking change with a new DistributedLock abstraction. We will revise it the next version.
Feel free to fork the project and try something yourself! we also can review small PRs even if the feature is not complete yet or can't make it into the current version.
Any news on this enhancement?
See my previous comment:
because of to big breaking change with a new
DistributedLockabstraction. We will revise it the next version.
So, we are going to tackle this in the next 6.0 version.
@artembilan Any news on this issue ? version version 6.0 is about to be release right ?
No, we didn't have a chance to look into this.
So, it is unlikely this will make it into the current 6.0 RC1 in two weeks.
Feel free to contribute some if you'd like.
Thank you for understanding!