dotnet-operator-sdk [bug]: LeaderAwareResourceWatcher does not regain leadership after network issues

[bug]: LeaderAwareResourceWatcher does not regain leadership after network issues

Open PSanetra opened this issue 7 months ago • 1 comments

Describe the bug

I have observed that the LeaderAwareResourceWatcher looses and never regains leadership after network issues.

To reproduce

LeaderAwareResourceWatcher.StartAsync()
Wait for LeaderAwareResourceWatcher to be connected
Create network issue (e.g. reset_peer with toxiproxy)
This instance stopped leading, stopping watcher. will be logged
Resolve network issue
This instance started leading, starting watcher. is not logged again

Expected behavior

This instance started leading, starting watcher. should be logged after network issue is resolved.

Screenshots

No response

Additional Context

Version: 8.0.0-pre.29

Dec 04 '23 17:12 PSanetra

I think the LeaderElector.RunAsync() API is very confusing and not documented, but it seems like the OnStoppedLeading event is only called in the finally clause of RunAsync: https://github.com/kubernetes-client/csharp/blob/15ad5bdfc451debbca2e0d23821cef4393885525/src/KubernetesClient/LeaderElection/LeaderElector.cs#L104-L108

Therefore I guess it is necessary to call RunAsync in a loop until the LeaderAwareResourceWatcher is stopped.

Dec 04 '23 17:12 PSanetra

dotnet-operator-sdk dotnet-operator-sdk copied to clipboard

[bug]: LeaderAwareResourceWatcher does not regain leadership after network issues

Describe the bug

To reproduce

Expected behavior

Screenshots

Additional Context

dotnet-operator-sdk
dotnet-operator-sdk copied to clipboard