cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

Extend the ability for CAPI to filter what CRs it reconciles

Open richardcase opened this issue 2 years ago • 24 comments

User Story

As an operator I would like to be able to run multiple instances of CAPI (and its providers) in the management cluster for operational reasons like multi-tenancy.

As an operator I would like to be able to have more control over which CRs instances of CAPI controllers watch & reconcile for operational reasons like multi-tenancy.

Detailed Description

As a result of #4119 (issue #4004) it's possible to tell instances of CAPI to only reconcile CRs that have a specific value for the cluster.x-k8s.io/watch-label label. This means when filtering that every instance of CAPI must correspond to a specific value for this filter.

It is not currently possible to say instance 1 of CAPI reconciles CRs with a label value val1 and instance 2 of CAPI reconciles CRs where the label value is not val1.

It would be good to have the ability to provide a wider range of "filters". For my scenario, i'm interested in having support for == and != but it could easily be || or &&.

Anything else you would like to add:

The current filtering based on the label (and the WatchFilterValue) is limited by the allowed characters for the label value (i.e. ! = && are not supported) and that its a single label.

One option to support != is to look for a prefix in the label value like: cluster.x-k8s.io/watch-label: not system. But this seems less than ideal for a number of reasons.

Another option is that we support label selectors (or some other mechanism) in the CAPI controllers.

/kind feature

richardcase avatar Dec 19 '22 10:12 richardcase

Sounds okay to me.

@richardcase just a question about a related topic out of curiosity. do you know how webhooks are used in those scenarios? (i.e. are all provider instances called or only one of them)

This question shouldn't block this issue as the issue is about extending an existing functionality

sbueringer avatar Dec 21 '22 09:12 sbueringer

Good question @sbueringer, I hadn't overly considered that.....but it's something of interest now you've mentioned it 🤣 Will have a think and get back to you.

richardcase avatar Dec 21 '22 16:12 richardcase

/triage accepted If the problem is multi-tenancy, in the sense of infrastructure tenant, our recommended way is to achieve this by using a single instance of the controller while leveraging different credentials.

Based on my experience, running many instances of the same controllers will lead to more problems than benefits, because in Kubernetes today we have a unique type space (CRD and webhooks) and this conflicts with managing the lifecycle of many instances. One day, when https://github.com/kcp-dev/kcp will be upstream this will be much simpler

However, we already committed to make this possible despite well know issues, and this seems an iteration of this. If we have to improve the semantics for filtering, let me throw a crazy idea on the table. What about deprecating the current watch-label and introducing something like watch-label-selector (a serialized label selector), so we give full power to users while leveraging core Kubernetes capabilities/concepts?

fabriziopandini avatar Dec 21 '22 18:12 fabriziopandini

What are the limitations that prevent this use case from being supported by leveraging --namespace to watch the each capi controller? Also can we elaborate further the use case that motivates this need?

enxebre avatar Jan 02 '23 11:01 enxebre

The --namespace approach has a similar limitation to the cluster.x-k8s.io/watch-label label in that it's a 1:1 mapping from the filter value to the CAPI instance. I'm after something a bit more flexible, as an example:

  • CAPI instance 1 reconciles cluster definitions with label value XYZ
  • CAPI instance 2 reconciles cluster definitions without label value XYZ

richardcase avatar Jan 03 '23 10:01 richardcase

What we need is not only filter on labels. In kurator, we also need to filter by the spec, likely the kubeconfigControlPlane's reference.

hzxuzhonghu avatar Jan 31 '23 02:01 hzxuzhonghu

So i propose passing a filter function to the Reconfiler, and every reconcile can has its own customization filter.

Today it is very friendly, because it requires all resources to have the same label.

hzxuzhonghu avatar Jan 31 '23 02:01 hzxuzhonghu

Regarding the discussion from here: https://github.com/kubernetes-sigs/cluster-api/pull/8016/files#r1090501708

Filters should be applied before actually entering in the reconcile loop (e.g. in while setting up controllers)

But then how can we filter out resources that are triggered by another kind of resource?

I think @hzxuzhonghu is right.

All predicates that we can pass into the controller builder or the controller when using For / Owns / Watches / Watch etc. only filter the incoming events. The "pipeline" for watches looks something like this:

  • Watch adds an eventhandler (usually pkg/source/internal.EventHandler)
    • internal.EventHandler wraps an handler.EventHandler (usually a enqueueRequestsFromMapFunc)
  • internal.EventHandler receives an event
  • internal.EventHandler iterates through predicates and drops the event if one predicate returns false
  • Otherwise it calls the Create/Update/.. func on the handler.EventHandler (which is usually a enqueueRequestsFromMapFunc)
  • enqueueRequestsFromMapFunc maps the incoming object to a reconcile.Request and adds it to the queue
  • Eventually the reconcile.Request is taken from the queue and Reconcile on our reconciler is called

tl;dr all the predicates we add are only applied to the events. They are not applied to the final objects we reconcile.

I'm not aware of a mechanism in CR today which allows us to filter based on the reconcile.Requests and especially not on the full objects (@vincepri please correct me if I'm wrong).

Some options:

  1. Add the Filter parameter to all our reconcilers and use them during Reconcile after we got the reconciled object (your PR: https://github.com/kubernetes-sigs/cluster-api/pull/8016)
  2. Implement a "wrapper" Reconciler around all our reconcilers which basically does the same (get the object and filter) (could work with generics)
  3. Implement something in controller-runtime (not sure how exactly)

cc @vincepri WDYT?

sbueringer avatar Jan 31 '23 13:01 sbueringer

@sbueringer @vincepri any conclusion

hzxuzhonghu avatar Feb 08 '23 03:02 hzxuzhonghu

I posted my take, can't answer for others.

sbueringer avatar Feb 08 '23 05:02 sbueringer

cc @fabriziopandini If you are ok with https://github.com/kubernetes-sigs/cluster-api/issues/7775#issuecomment-1410393884, can we let https://github.com/kubernetes-sigs/cluster-api/pull/8016 go

hzxuzhonghu avatar Feb 08 '23 07:02 hzxuzhonghu

@sbueringer - great write-up and suggested options 🥇

When i created this issue, i had in mind a way of filtering events on any attribute of client.Object, so limited to things like namespace, labels, annotations etc and not on anything in spec/status. And then support operators like == and!=. The idea being that i can say "only reconcile Clusters where the watch label value != val1"

Something like this could ultimately translate down to predicates (like the WatchFilterValue does) unless i'm mistaken.

richardcase avatar Feb 08 '23 13:02 richardcase

@fabriziopandini @richardcase watch-label-selector seems like a good approach for our use case, how do we structure it? We can possibly reuse https://github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/types.go#L1203 and all the logic that comes with parsing it, but it might be too complicated for an annotation.

alexander-demicev avatar Jul 04 '23 09:07 alexander-demicev

Going back to the original use case:

I would like to be able to run multiple instances of CAPI (and its providers) in the management cluster for operational reasons like multi-tenancy.

Controller runtime has some mechanics today to filter events, although if the goal is to have hard tenancy requirements, data should probably be logically separated in different namespaces first, and then caches should pre-filter on those namespaces.

There is some missing flexibility in all of this, for example adding or removing namespaces being watched at runtime isn't something controller runtime supports; but there are other ways to get around it maybe with the dynamic reloading on RBAC (https://github.com/kubernetes-sigs/controller-runtime/issues/2176) if that gets implemented.

In general, I'd start by looking at the problem from controller-runtime lenses and list the Cluster API use case as a primary goal.

vincepri avatar Jul 06 '23 14:07 vincepri

Could #8982 be a way to solve it? It allows to add event filtering for all discussed use-cases, with a different level of complexity.

Danil-Grigorev avatar Jul 10 '23 11:07 Danil-Grigorev

Maybe? We should start from a small proposal in controller runtime first before implementing something very custom in Cluster API; the client-side predicates don't solve any of the hard tenancy requirements though. For example: cache would still hold objects that a specific manager shouldn't reconcile, they're just filtered before being passed along.

vincepri avatar Jul 10 '23 13:07 vincepri

Is it possible to somehow filter objects based on a logical expression? This way the solution to the problem can be more flexible. Caches won’t hold the object if the expression evaluates to true.

alexander-demicev avatar Jul 10 '23 20:07 alexander-demicev

The cache can be filtered by namespace, labels, or fields. Although these are very simple equality parameters; supporting dynamic behaviors more than that would require changes in client-go and potentially api-server as well. Ultimately we'd want the internal cache to follow RBAC rules; results returned from list/watch calls should be pre-filtered based on those parameters.

vincepri avatar Jul 10 '23 23:07 vincepri

I think that to provide full flexibility to solving this the expression selector should be added all the way down to apimachinery ListOptions, then implemented it in client-go and proxy this in controller-runtime leveraged with ByObject option. It is a lot of work. https://github.com/kubernetes/kubernetes/issues/53459 is talking about the need to achieve some more generic filtering mechanism, and they suggest to do this on local controller caches anyway, so it might be the way, though a complicated one.

RBAC approach sounds interesting but I'm not sure it provides such granularity the issue describes. You can't do logical expressions as well as support != on some label key, and not always you can modify something you want to ignore to apply to existing rule set.

Danil-Grigorev avatar Jul 11 '23 09:07 Danil-Grigorev

Are we trying to solve a hard tenancy problem, or a sharding one?

vincepri avatar Jul 11 '23 16:07 vincepri

Are we trying to solve a hard tenancy problem, or a sharding one?

For our use case, it's more a case of sharding. We'd like to have multiple instances of the controllers, each reconciling a subset of cluster definitions, where each instance isn't limited to a single watch filter or namespace.

richardcase avatar Jul 12 '23 07:07 richardcase

That's a bit easier to think through, although I'd start by opening an issue in Controller Runtime first. We'd need a small proposal: we can make an experimental features that allows controllers to shard data automatically based on a subset of parameters. We'd still be limited to what the api-server can offer if we care about caching only specific data, although that use case becomes an optimization more than anything for this goal.

vincepri avatar Jul 12 '23 15:07 vincepri

/priority backlog

fabriziopandini avatar Apr 12 '24 13:04 fabriziopandini

We should merge this discussion with #11192 #11193

sbueringer avatar Oct 02 '24 12:10 sbueringer

I think we can parse some string option (e.g. --watch-filter="metadata.namespace!=kube-public,metadata.namespace!=kube-system") value by: fields.ParseSelectorOrDie e.g.:

Cache: cache.Options{                                                                                                                                        
                        DefaultFieldSelector: fields.ParseSelectorOrDie(watchFilter)
...

There is:

  • == , !=
  • '&&' (it's ',' here)

But (IMHO) there is no option for:

  • '||'

marek-veber avatar Nov 11 '24 12:11 marek-veber