controller-runtime icon indicating copy to clipboard operation
controller-runtime copied to clipboard

Support running code in leader election before starting any controllers

Open alvaroaleman opened this issue 4 years ago • 18 comments

We have a couple of situations where we have functionality that needs to run on an elected leader only but before any of the controllers is started. One example is part of our configuration, which used to be passed in as file and is being migrated to a CRD. The migration code for this must run on an elected leader, but before any of our controllers are gets started.

Is this a usecase worth considering for c-r?

/cc @xrstf

alvaroaleman avatar Sep 17 '19 21:09 alvaroaleman

maybe? It's probably not terrible to support. To be clear with the use case:

Does this need to run every time something gets the leader, or is it a "only one thing needs to run this, and it needs to be run before all the things start?"

Running under the normal leader election before controllers would work fine for both use cases, but I'm trying to figure out how terrible it is to just run separately before anything else starts.

DirectXMan12 avatar Sep 17 '19 23:09 DirectXMan12

Does this need to run every time something gets the leader, or is it a "only one thing needs to run this, and it needs to be run before all the things start?"

Not completely sure I properly understand the question. It should always run after acquiring the leader lock and before starting anything else.

Running under the normal leader election before controllers would work fine for both use cases, but I'm trying to figure out how terrible it is to just run separately before anything else starts.

I guess that would work, however last time I checked there was no stepping down for elected leaders which means that the controller has to wait $leaderLockLease time after the migrations ran before it can start the controllers.

alvaroaleman avatar Sep 18 '19 07:09 alvaroaleman

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Dec 17 '19 07:12 fejta-bot

/assign gerred

gerred avatar Jan 10 '20 02:01 gerred

@gerred: GitHub didn't allow me to assign the following users: danderson.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to this:

/assign danderson

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 10 '20 02:01 k8s-ci-robot

/remove-lifecycle stale

gerred avatar Jan 10 '20 02:01 gerred

Adding a +1 to this, as discussed on Slack just now.

My use case is within MetalLB. Before MetalLB can start reconciling objects, it needs to build up some cluster-wide internal state (for example, which LoadBalancer IPs are currently assigned to various services).

The current implementation uses client-go and an internal wrapper library. In c-r terms: on startup, we generate reconcile callbacks for every object we're interested in, but the controller knows that the cache is not synced yet, so it just updates internal state and doesn't reconcile. When all the caches reach synced state, we fire a synthetic "synced" callback, which tells the rest of MetalLB to switch from "build internal state only" to "actively reconcile objects", and then we re-fire all the reconcile callbacks one more time to actually reconcile everything.

With c-r currently, the API guarantees that all caches are synced before the controller receives any reconcile callbacks. The API does not offer any place where I can do some preliminary work before entering the steady state of reconciling objects.

It's possible to implement what MetalLB needs, by keeping an internal initialized flag in the controller. The first time a reconcile callback is invoked, the reconcile code can see that initialization isn't done, dump all objects of interest out of the cache, and initialize at that point. This is fine, but it's a little bit ugly and makes the code less clear: there's no separation in code between one-time initialization and steady state.

It would be nice to have an optional Init callback on controllers, which gets invoked after all caches are synced, but before the controller is started. This would provide a clear point in the code where one-time initialization should happen, and keep that logic out of the reconcile function.

danderson avatar Jan 10 '20 02:01 danderson

I'm :+1: to the callback route instead of a flag.

Some questions I have:

  • Is Init() intended to be idempotent on a controller restart?
  • Is Init() called again after the cache re-sync period? Relates to idempotency, but is subtly different.
  • What is the failure mode for these callbacks at the CR level? How does CR let operators/controllers manage callback errors?

gerred avatar Jan 10 '20 03:01 gerred

  • Is Init() intended to be idempotent on a controller restart?

Assuming no change to the cluster state, I would say yes. However, I'm not sure there's value in requiring it? Maybe I don't understand what you mean by controller restart, do you mean something other than a whole-process restart?

  • Is Init() called again after the cache re-sync period? Relates to idempotency, but is subtly different.

I don't think so, Init is used to go from "I have no idea about the world" to "I have some mostly-updated idea about the world". From there, re-syncs (and any reconciles triggered by it) are just part of keeping up with changes to the world.

  • What is the failure mode for these callbacks at the CR level? How does CR let operators/controllers manage callback errors?

I'd be equally happy with two options: error != nil on Init is a fatal error (whole controller errors out), or behavior similar to reconcile, where the controller can request retries. I think the latter would be preferable, in that it allows the controller to fail due to transient errors external to k8s (e.g. an S3 controller failing because S3 is having an outage, but CR keeps retrying to init and eventually S3 is happy again).

danderson avatar Jan 10 '20 04:01 danderson

Assuming no change to the cluster state, I would say yes. However, I'm not sure there's value in requiring it? Maybe I don't understand what you mean by controller restart, do you mean something other than a whole-process restart?

I mean a whole process restart - e.g. is there enough state stored in CRDs where we would expect idempotency. I guess that's up to the operator developer at that point and not the business of the callback mechanism itself.

gerred avatar Jan 10 '20 04:01 gerred

For my particular use case right now, I'd expect idempotency. However, I can imagine controllers where that wouldn't be the case, this initialization may rely on external systems.

As a concrete example: a common request I get is to make MetalLB integrate with DHCP servers, to dynamically obtain/release IPs. If I were to support that, during Init I'd be poking the DHCP server for each IP listed as assigned in k8s, to check that the DHCP lease is still good. If we're talking "restart after kill -9", then the reconstructed state would probably end up being the same as the state we just lost... But it's dependent on the good behavior of a non-k8s system.

danderson avatar Jan 10 '20 04:01 danderson

@danderson So your issue is only about being notified about a synced cache before any reconciliation started? If I understand the issue correctly, this also implies that this "Init code" has to finish executing before reconciliation is attempted?

If the above is correct, would it be possible to use the APIReader within the Add function of your controller?

alvaroaleman avatar Jan 26 '20 14:01 alvaroaleman

/kind design /milestone Next /help

vincepri avatar Feb 21 '20 17:02 vincepri

@vincepri: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/kind design /milestone Next /help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 21 '20 17:02 k8s-ci-robot

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar May 21 '20 17:05 fejta-bot

/lifecycle frozen

vincepri avatar May 21 '20 18:05 vincepri

I have a similar use case where I need to build up an in-memory allocation map after being nominated as leader, but before running the reconcile loop. Is there an interest in adding a hook to the internal controller to support this use case?

terinjokes avatar Sep 26 '22 22:09 terinjokes

I have a use case where reconcilers for different resources depend on some common state which must be computed after leader election, but before reconciliation begins. I'm currently handling it by following the suggestion in https://github.com/kubernetes-sigs/controller-runtime/pull/2044#issuecomment-1316095051, and using NewUnmanaged to wrap each Start method with a call to initialize the common state if it hasn't already happened. It's working well enough, but landing #2044 would reduce the complexity.

cbroglie avatar Feb 22 '23 03:02 cbroglie