kube
kube copied to clipboard
Add low-level distributed lock
Signed-off-by: Mikail Bagishov [email protected]
Motivation
Leader election was requested in #485. It is widely used when implementing controllers (if two instances try to run some controller simultaneously, they will create excessive load on apiserver and huge amount of Conflict HTTP errors).
There were several attempts to implement this feature, but they seem stalled. It looks like there are questions about how the API should look like.
Solution
This PR (based on #843) adds a low-level distributed lock on top of the Kubernetes leases.
It is not expected that an end user will somehow interact with this API. It may become private in the future.
On the other hand, RawLock should be a sound basis for user-facing API (i.e. all that remains is somehow periodically call try_acquire and do something if we can't do that). I hope this separation will aid in finding simple and safe interface for leader election in kube.
PR includes a simple example. Better ways to test correctness are appreciated :)
Future extensions
- Detecting that
identityis not unique. This can be added to RawLock later and without major changes. - Live reconfiguration (rename lease / split lock into several finer-grained / merging several locks into one coarser-grained). Should all be possible on top of the proposed API.
Codecov Report
Merging #996 (c3df337) into master (57d97ce) will decrease coverage by
1.35%. The diff coverage is0.00%.
@@ Coverage Diff @@
## master #996 +/- ##
==========================================
- Coverage 72.58% 71.22% -1.36%
==========================================
Files 64 65 +1
Lines 4505 4591 +86
==========================================
Hits 3270 3270
- Misses 1235 1321 +86
| Impacted Files | Coverage Δ | |
|---|---|---|
| kube-runtime/src/lock/raw.rs | 0.00% <0.00%> (ø) |
Hey, good to see someone else take a stab at this! Sadly, this doesn't (seem to) detect when the Lease is "stolen" until the next renewal attempt, and the API isn't really conducive to adding a check for this after the fact.
Obviously we're never going to be able to break physics and avoid any kind of race in that case, but I believe we more or less have to at least try to detect it for any kind of leadership election module to be viable.
detect when the Lease is "stolen" until the next renewal attempt, and the API isn't really conducive to adding a check for this after the fact.
Well, in normal circumstances events go like this:
- Leader election primitive fails to renew lock for a configured period
- Leader election pritimitve notifies application
- Timeout on lock expires
- Lock is stolen
In such a case, there is no need to check whether Lease is stolen (by the time it is, we are dead :))
Correct me if I'm wrong, but lock can only be stolen "unsafely", if something is seriously broken:
- Lock renewing task has starved (but in this case we can't do anything anyway)
- Cluster has enormous clock skew and other actor saw lock as stale
- User deleted or modified lease spec
- and so on
TBH I don't really think fast detection of this degenerate situations is really useful (anyway there are many issues and misconfigurations we are not able to detect, like user specifying different Lease name for different actors). But if we want, I see only two ways to do this:
- Periodically
getLease from the apiserver and verify it is owned by us. But we may then calltry_acquireas well. - Open a
watchagainst the lease and react to its events. This logic is completely independent of theRawLockitself, so it can be implemented separately.
To conclude: I think that proposed RawLock design does not complicate detection of lock being stolen.
UPD: just to clarify: in your PR, there is a code that runs concurrently with the user-provided future and keeps the lock renewed. I agree that such a code must be in kube-rs rather than in the client application. However I propose that this background work is managed not by RawLock, but by a higher-level primitive.
But we may then call try_acquire as well.
Thinking about it more, this is wrong because try_acquire is less efficient (it sometimes does 2 requests, and also updates are heavier than reads).
I added refresh method which only queries latest lease. It may further be used as a more efficient basis for stolen lock detection.
@MikailBag PR https://github.com/kube-rs/kube/pull/1056 is done, is being used in production, and has worked seamlessly for quite some time now (~6m at least [since before PR was opened]).
The implementation in https://github.com/kube-rs/kube/pull/1056 matches the one used by Go core controllers. It does NOT attempt to enforce "locking" or fencing, as that is not something which can actually reliably implemented on top of such an abstraction. You need something like Raft for actual fencing (when dealing with distributed systems).
Closing this along with all other leader election PRs for now as per #485.