controller-runtime Enable additional options for Leader Election Callback

The current implementation of controller-runtime supports setting up a OnStartLeading callback in a not so direct way. (I am not complaining about this at all BTW).

https://github.com/kubernetes-sigs/controller-runtime/blob/c7a98aa706379c4e5c79ea675c7f333192677971/pkg/manager/internal.go#L615-L621

This needs you to implement a Runnable

https://github.com/kubernetes-sigs/controller-runtime/blob/c7a98aa706379c4e5c79ea675c7f333192677971/pkg/manager/manager.go#L293-L298

And then https://github.com/kubernetes-sigs/controller-runtime/blob/c7a98aa706379c4e5c79ea675c7f333192677971/pkg/manager/manager.go#L311-L315.

Then you can register a runnable under the https://github.com/kubernetes-sigs/controller-runtime/blob/b9219528d95974cb4f5b06f86c9b1c9b7d3045a5/pkg/manager/runnable_group.go#L53-L69.

This works perfectly. However, this lacks a few other options provided by the leaderelection config

type LeaderCallbacks struct {
	
	OnStartedLeading func(context.Context)
	
	OnStoppedLeading func()
	
	OnNewLeader func(identity string)
}

The handler for OnStoppedLeading exists but It is not entirely extendable to get a custom set of operations performed.

https://github.com/kubernetes-sigs/controller-runtime/blob/196828e54e4210497438671b2b449522c004db5c/pkg/manager/internal.go#L622-L633

It can be super useful to provide a way to extend and hook into these events as well in the controller runtime like the current one provided. This can be really useful in performing certain cleanup operations that were done when you start leading to make sure you make way for the next controller to process things.

If this is an acceptable thing to add, Please let me know and I will be more than happy to open a PR with the changes.

May 30 '22 14:05 harshanarayana

xref: Extension of the ask in #1875

May 30 '22 17:05 harshanarayana

The handler for OnStoppedLeading exists but It is not entirely extendable to get a custom set of operations performed.

@harshanarayana Yeah, now the cm.onStoppedLeading is only used for test, but each custom runnable will get its ctx finished from Start(context.Context) error.

For example, if you has implemented a custom runnable with NeedLeaderElection and added it into manager, you can do some cleanup this way.

// it will be called since OnStartedLeading
func (*myRunnable) Start(ctx context.Context) error {
    // do something ...
    select {
    case <-ctx.Done():
        // do some cleanup
    }
}

But note that ctx.Done() will be triggered when the manager is exiting, which may be caused by OnStoppedLeading or some other reasons like the process get a TERM/KILL signal.

May 31 '22 06:05 FillZpp

@FillZpp This is exactly what I have been doing so far. However, it has not been easy to detect if the trigger is because of leader stopping or due to things such as SIGTERM/SIGKILL.

func (s *SecretMonitorRunner) Start(ctx context.Context) error {
	s.StartOnce.Do(func() {
		s.Started = true
		go SecretEventMonitor(s.StopChan)
	})
        select {
            <- ctx.Done():
                // check the lease to see if the leader is different or lease valid < current time. then process otherwise ignore.
        }
	return nil
}

This gets a bit painful to manage. Would it be acceptable to extend a configurable callback that gets triggered during the OnStoppedLeading the straight way ?

May 31 '22 07:05 harshanarayana

@harshanarayana Aha, understand. I think it is acceptable to add a OnStoppedLeading into manager.Options, in which ppl could define their things to do when leader election lost.

WDYT @alvaroaleman @vincepri

May 31 '22 08:05 FillZpp

How about OnNewLeader ? I will open a draft PR and we can get it modified accordingly.

May 31 '22 08:05 harshanarayana

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aug 29 '22 08:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Sep 28 '22 08:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Oct 28 '22 09:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 28 '22 09:10 k8s-ci-robot