controller-runtime Add and remove watches at runtime

Opening a new issue to raise visibility of my question here.

We have a very similar use-case as that linked issue, and we'd like to know how one can actually remove watches at runtime.

May 03 '22 22:05 jcanseco

@jcanseco Could you please describe your use-case that needs to remove watches(informers) dynamically?

May 05 '22 07:05 FillZpp

@FillZpp sure.

Suppose you have a Resource A that depends on Resource B. We'd like to immediately reconcile Resource A once Resource B is Ready.

We were hoping to do that by having the Resource A Controller create a Watch on Resource B, so that we can enqueue Resource A for immediate reconciliation once Resource B is ready. Once done, we'd like to remove the Watch since it's no longer needed.

Please let us know if there is a better way to resolve this problem as well.

May 06 '22 00:05 jcanseco

Are A and B the resource kinds or the object names?

I suppose they are the resource kinds. So you firstly watch two kinds, but you only care about ONE object of the kind B. After watching the only object of kind B, your controller triggers to reconcile kind A and no longer needed to watch kind B.

Do I understand correctly?

May 06 '22 02:05 FillZpp

Ah yes apologies for not being precise enough. That is correct.

To be particularly precise: Resource A and B are two particular objects (i.e. uniquely identified by the combination of their GVK, Namespace, and Name).

Typically in our case, the GVKs of Resource A and B are different.

May 06 '22 04:05 jcanseco

Understand. Another question is, is there only one particular object of Resource B? Or there are multiple objects of Resource B, but you only need to watch the particular one of them?

How about this:

If there is only one object of Resource B, I think you don't have to stop and remove its informer.
If there are multiple objects, you could use SelectorByObject to make sure your cache only watch the particular object and then don't have to stop and remove it.

May 06 '22 05:05 FillZpp

By Resource B, are you referring to just the GVK? If so, yes, there can be multiple objects with GVK B, but we only need to watch one particular object (i.e. a particular GVK + Namespace + Name combination).

I am not sure I understand the suggestion -- can you elaborate?

Perhaps another important info to add is: We have a controller manager with multiple controllers. Each controller manages all the objects with particular GVK. Our controller manager manages both GVK A and GVK B. All the controllers, as I understand, share the same cache.

May 06 '22 05:05 jcanseco

We have a controller manager with multiple controllers. Each controller manages all the objects with particular GVK. Our controller manager manages both GVK A and GVK B. All the controllers, as I understand, share the same cache.

Oh, so you have multiple controllers in your operator and different controllers may care about different objects of Resource B. In this way, you are not going to stop or remove the whole informer of B, but just want to remove a watch for a controller?

However, there is no way to remove listeners in informer, which you can not remove the watch. The only thing you can do is to ignore all events in the watch handler after it has got the particular object.

May 06 '22 05:05 FillZpp

We have multiple controllers in our operator, and different controllers care about different Resources (GVKs). So there is one controller managing all objects of Resource A, one controller managing all objects of Resource B, etc.

We want Controller A to create a temporary Watch on Object B in order to know when to enqueue Object A for immediate reconciliation.

However, there is no way to remove listeners in informer, which you can not remove the watch. The only thing you can do is to ignore all events in the watch handler after it has got the particular object.

This is the same conclusion I've come to, but wouldn't the suggested solution be problematic since it would effectively result in leaked memory? Note that we'll have to create a lot of these temporary watches.

What do you think of this idea:

Have Controller A Watch Object B through a custom Source.
The custom Source works a lot like source.Kind, but it creates its own new SharedInformer, and it contains a stop channel that can be signalled from outside the Source.
Once Object B is Ready, the event handler signals the Source to stop. The Source then stops its SharedInformer.

Is this feasible? I am not sure if you can create a brand new SharedInformer completely separate from the ones used by the controllers, and I am not sure if it's a good idea to potentially create a lot of independent SharedInformers even if only temporarily.

May 06 '22 19:05 jcanseco

@jcanseco If so, you may just have to use the channel source with generic event channel to trigger Controller A from Controller B. There is no need to create a signal informer.

// a global channel
var triggerChan = make(chan event.GenericEvent, 1)

// Controller A, watches the channel source and enqueue Resource A in the handler.
// Note that the event will be generic event and should be handled by Generic() of handler.
...Watches(&source.Channel{Source: triggerChan}, &handler)

// Controller B, once Object B is ready, put it into channel with Once
var once sync.Once
once.Do(func() { triggerChan <- event.GenericEvent{Object: ...} })

May 07 '22 03:05 FillZpp

Ok, let me try to make sure I understand. Are you suggesting for us to have all controllers watch one channel (triggerChan), and then use that channel to allow Controller B to notify all the other controllers (e.g. Controller A) who are waiting for Object B to be ready?

If so, how should we make sure Controller A only handles events for Object B temporarily (i.e. when Object A is waiting for Object B) -- I assume we'll need to maintain a data structure for Controller A that lists the objects it's waiting for currently, and have the event handler filter on those -- would this be the approach you'd take, or is there a better way?

May 07 '22 03:05 jcanseco

Also, a different question:

I know you said we should consider looking into using source.Channel instead of creating a new SharedInformer for each temporary watch -- we will definitely consider this.

As a backup option though, I am still curious if it is possible and sensible to create new SharedInformers every time we want to create a temporary watch as described in my previous comment? We already kind of have a prototype that does this, so I am wondering if we can use this a potential fallback option.

May 07 '22 03:05 jcanseco

Ok, let me try to make sure I understand. Are you suggesting for us to have all controllers watch one channel (triggerChan), and then use that channel to allow Controller B to notify all the other controllers (e.g. Controller A) who are waiting for Object B to be ready?

Emm... So there is only Controller A or multiple controllers that all wait for Resource B to be Ready?

If there are multiple controllers, you can wait for the temporary channel to be closed in each controller, which means Resource B has been Ready.

// There is a global channel which will be closed when Resource B is Ready.
var tempChan = make(chan struct{})

// Controller B, close the tempChan only once when Resource B is Ready.
once.Do(func() { close(tempChan) })

// In each controller that wait for B Ready, use func channel and wait for tempChan to be closed
Watches(source.Func(func(ctx context.Context, _ handler.EventHandler, q workqueue.RateLimitingInterface, _ ...predicate.Predicate) error {
    go func() {
        select {
        case <-tempChan:  // wait for channel to be closed
        case <-ctx.Done():
            // cache has been canceled or closed
            return
        }
        q.Add(...)  // enqueue Resource A or any other resources for other controllers
    }()
    return nil
}), &handler.EnqueueRequestForObject{})

I don't really recommend you to create a new SharedInformer for each temporary watch, which will bring pressure on both apiserver and operator. But it depends on your scenario, I don't know much about your controllers and the relationship between the resources.

May 07 '22 08:05 FillZpp

But it depends on your scenario, I don't know much about your controllers and the relationship between the resources.

Apologies for the confusion. Perhaps it might help to make this more concrete. We are Config Connector -- an operator for Google Cloud resources. We have multiple resource kinds -- each one is managed by a controller. So for example, there is one controller for PubSubSubscription objects, another for PubSubTopic objects.

Objects can reference other objects, forming a DAG. For example, one PubSubSusbcription can reference multiple PubSubTopics, and multiple PubSubSubscriptions can reference the same PubSubTopic.

Take this example: the PubSubSubscription needs to wait for both PubSubTopics to be ready. But there can be another PubSubSubcription that need to wait for one or both of those PubSubTopics as well.

Going back to the suggested solution: I don't think this will work for us since our controllers (e.g. controller A) will need to be able to temporarily watch various objects of various resource kinds throughout the controller's lifecycle (e.g. various objects of resource kind B), so one global channel that is intended to be closed once won't work. We need a more generic solution.

Actually, thinking about it now, using channels to coordinate across controllers by itself won't work since we allow users to create multiple instances of our controller managers running in different pods, intended to manage different namespaces, so we need to be able to coordinate across process boundaries too.

I came up with a working prototype where:

Every controller watches a source.Channel.
When the controller encounters a non-Ready dependency, it creates a temporary Watch (by creating a new dynamic client instead of a new SharedInformer).
Then, once the dependency is ready, the goroutine monitoring the Watch adds a GenericEvent to the source.Channel to trigger reconciliation of the referencing resource, then stops the Watch.

This works. The concern is that we'll have to create temporary watches whenever a dependency is not ready -- I think this might be OK since our watches should be short-lived and are only really needed when users create new resources. We'll also end up saving on a lot of rapid, repeated Get requests which is what we do right now every time a dependency is not ready (due to how failed reconciliations are requeued with exponential backoff).

What do you think though? Are Watches that expensive to be concerned about? Ideally we'd want to rely on using the controller manager's SharedInformer to share watches (at least within the same process), but as you said, there is no way to remove event handlers right now.

May 07 '22 20:05 jcanseco

Actually I have similar need for metacontroller - it allows to dynamically add / remove kinds which users want to watch. Not it is using some internal implementation of SharedInformers, I would be happy to switch to something from common libraries (client-go or controller runtime) if this functionality will be added

May 21 '22 12:05 grzesuav

Actually I have similar need for metacontroller - it allows to dynamically add / remove kinds which users want to watch. Not it is using some internal implementation of SharedInformers, I would be happy to switch to something from common libraries (client-go or controller runtime) if this functionality will be added

@grzesuav If I understand correctly, what you want is to add/remove informers of some GVKs dynamically, but @jcanseco wants to keep the informers but only remove some registered event handlers from the informers.

May 23 '22 02:05 FillZpp

Actually (I typed from my phone) I want the same, in the metacontroller RemoveEventHandler is the only method added to interface - https://github.com/metacontroller/metacontroller/blob/master/pkg/dynamic/informer/informer.go#L51

May 23 '22 09:05 grzesuav

Actually (I typed from my phone) I want the same, in the metacontroller RemoveEventHandler is the only method added to interface - https://github.com/metacontroller/metacontroller/blob/master/pkg/dynamic/informer/informer.go#L51

Okey, I understand. That's a little different that RemoveEventHandlers() from metacontroller only remove your own handlers, while others handlers in the underlying shared informer won't be removed. But in controller-runtime, whose informer is only shared by controllers in the same process, one controller can not remove all handlers registered by multiple controllers. And how could a controller declare to remove its own handlers, that's a problem if we are going to support it in client-go and controller-runtime.

May 24 '22 03:05 FillZpp

I have a similar requirement where Controller-A watches for certain events and dynamically creates additional controllers like this ctrlruntime.NewControllerManagedBy(manager). Named("test"). For(&Unstructured{}). Complete(reconciler) How do I delete this controller so that I don't receive reconciles for the object anymore. I am looking for a function like ctrlruntime.RemoveControllerManagedBy(manager, controller)

Any help is appreciated

Jun 05 '22 16:06 rnsv

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 03 '22 16:09 k8s-triage-robot

PR to allow individual event handlers to be removed from SharedIndexInformer merged here:

https://github.com/kubernetes/kubernetes/pull/111122

cc @FillZpp

Sep 07 '22 22:09 alexzielenski

@alexzielenski That's great, thanks! I'm sorry I was going to help out looking at that PR, but I'm too busy these months...

So we just have to wait for a new release tag of client-go, and then it can be used in c-r.

/remove-lifecycle stale

Sep 08 '22 02:09 FillZpp

@FillZpp v0.25.2 should contain remove method

Oct 12 '22 12:10 p53

@FillZpp v0.25.2 should contain remove method

Hey @FillZpp 👋 thanks for helping with this issue! Now the new client-go release is out have the change been made to c-r? My usecase is the same as @rnsv, where we dynamically run ctrlruntime.NewControllerManagedBy(manager). ..... Complete(reconciler) and would love to be able to then remove the reconciler from the manager when we are done with it. Thanks!

Nov 09 '22 12:11 aclevername

@FillZpp

So we just have to wait for a new release tag of client-go, and then it can be used in c-r.

Yeah. I also has a problem with it. And I am waiting for this feature too.

Nov 09 '22 14:11 jnan806

@FillZpp any update on this 😄 ? I'd be happy to give it a go at implementing this if you can point me in the right places to work on 😄

Dec 12 '22 16:12 aclevername

@aclevername I have posted #2099 , but it still needs some more tests, I will add them later.

Guys @jcanseco @grzesuav @rnsv @jnan806 , K8s 1.26 has released and we could finally achieve this. You can also help out to review the PR and see if it can satisfy your needs.

Dec 14 '22 03:12 FillZpp

Great !

Dec 14 '22 06:12 jnan806

@FillZpp thanks! From controller perspective it look ok. In case of Source, an example how it can be used would be useful, I am not as familiar with controller runtime sourcecode to asses it.

Dec 14 '22 18:12 grzesuav

@FillZpp thanks for this, I am looking forward the PR to get merged. If you think I might be able to help with it, please reach out.

Jan 04 '23 08:01 odacremolbap

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 04 '23 08:04 k8s-triage-robot