Signed-off-by: Mikail Bagishov [email protected]

Motivation

Leader election was requested in #485. It is widely used when implementing controllers (if two instances try to run some controller simultaneously, they will create excessive load on apiserver and huge amount of Conflict HTTP errors).

There were several attempts to implement this feature, but they seem stalled. It looks like there are questions about how the API should look like.

Solution

This PR (based on #843) adds a low-level distributed lock on top of the Kubernetes leases. It is not expected that an end user will somehow interact with this API. It may become private in the future. On the other hand, RawLock should be a sound basis for user-facing API (i.e. all that remains is somehow periodically call try_acquire and do something if we can't do that). I hope this separation will aid in finding simple and safe interface for leader election in kube.

PR includes a simple example. Better ways to test correctness are appreciated :)

Future extensions

Detecting that identity is not unique. This can be added to RawLock later and without major changes.
Live reconfiguration (rename lease / split lock into several finer-grained / merging several locks into one coarser-grained). Should all be possible on top of the proposed API.

Aug 31 '22 21:08 MikailBag

Codecov Report

Merging #996 (c3df337) into master (57d97ce) will decrease coverage by 1.35%. The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #996      +/-   ##
==========================================
- Coverage   72.58%   71.22%   -1.36%     
==========================================
  Files          64       65       +1     
  Lines        4505     4591      +86     
==========================================
  Hits         3270     3270              
- Misses       1235     1321      +86

Impacted Files	Coverage Δ
kube-runtime/src/lock/raw.rs	`0.00% <0.00%> (ø)`

Aug 31 '22 21:08 codecov-commenter

Hey, good to see someone else take a stab at this! Sadly, this doesn't (seem to) detect when the Lease is "stolen" until the next renewal attempt, and the API isn't really conducive to adding a check for this after the fact.

Obviously we're never going to be able to break physics and avoid any kind of race in that case, but I believe we more or less have to at least try to detect it for any kind of leadership election module to be viable.

Sep 05 '22 08:09 nightkr

detect when the Lease is "stolen" until the next renewal attempt, and the API isn't really conducive to adding a check for this after the fact.

Well, in normal circumstances events go like this:

Leader election primitive fails to renew lock for a configured period
Leader election pritimitve notifies application
Timeout on lock expires
Lock is stolen

In such a case, there is no need to check whether Lease is stolen (by the time it is, we are dead :))

Correct me if I'm wrong, but lock can only be stolen "unsafely", if something is seriously broken:

Lock renewing task has starved (but in this case we can't do anything anyway)
Cluster has enormous clock skew and other actor saw lock as stale
User deleted or modified lease spec
and so on

TBH I don't really think fast detection of this degenerate situations is really useful (anyway there are many issues and misconfigurations we are not able to detect, like user specifying different Lease name for different actors). But if we want, I see only two ways to do this:

Periodically get Lease from the apiserver and verify it is owned by us. But we may then call try_acquire as well.
Open a watch against the lease and react to its events. This logic is completely independent of the RawLock itself, so it can be implemented separately.

To conclude: I think that proposed RawLock design does not complicate detection of lock being stolen.

UPD: just to clarify: in your PR, there is a code that runs concurrently with the user-provided future and keeps the lock renewed. I agree that such a code must be in kube-rs rather than in the client application. However I propose that this background work is managed not by RawLock, but by a higher-level primitive.

Sep 05 '22 19:09 MikailBag

But we may then call try_acquire as well.

Thinking about it more, this is wrong because try_acquire is less efficient (it sometimes does 2 requests, and also updates are heavier than reads).

I added refresh method which only queries latest lease. It may further be used as a more efficient basis for stolen lock detection.

Sep 07 '22 18:09 MikailBag

@MikailBag PR https://github.com/kube-rs/kube/pull/1056 is done, is being used in production, and has worked seamlessly for quite some time now (~6m at least [since before PR was opened]).

The implementation in https://github.com/kube-rs/kube/pull/1056 matches the one used by Go core controllers. It does NOT attempt to enforce "locking" or fencing, as that is not something which can actually reliably implemented on top of such an abstraction. You need something like Raft for actual fencing (when dealing with distributed systems).

Feb 23 '23 18:02 thedodd

Closing this along with all other leader election PRs for now as per #485.

Dec 03 '23 06:12 clux

kube
kube copied to clipboard

Add low-level distributed lock

Motivation

Solution

Future extensions

Codecov Report

kube kube copied to clipboard

Add low-level distributed lock

Motivation

Solution

Future extensions

Codecov Report

kube
kube copied to clipboard