SyncPeriod does not work if controller is created WithEventFilter(predicate.GenerationChangedPredicate{})
Basically this just reopens this old issue which hasn't been addressed yet: https://github.com/kubernetes-sigs/controller-runtime/issues/1990
@patrickshan should we just reopen the other issue instead?
/kind bug support
This bug was filed against v0.12.3, did this get fixed with v0.15.0?
The behavior described is expected and not a bug.
Every SyncPeriod the underlying informers will emit events to trigger reconciliation of all objects. But if there are predicates configured that filter out those events, accordingly the Reconcile func won't be executed.
Potentially we could improve our documentation there, but the behavior itself is intended
@sbueringer is there a way to filter for the sync trigger? The fact that it's undocumented makes it surprising for many relying on SyncPeriod. I don't think it's unreasonable that some would want their controller to reconcile on a regular schedule in addition to responding immediately to certain CRD changes.
I don't think it's unreasonable that some would want their controller to reconcile on a regular schedule in addition to responding immediately to certain CRD changes.
It's not unreasonable, but if the controller is configured to filter out events if the generation is not changed it will do exactly that.
As far as I'm aware there is no way to filter based on the "sync trigger". Basically this information is not propagated to where the predicate is called. I think this information is not available outside of the informer which is a component from client-go that we use.
I think without bigger refactorings across client-go and controller-runtime it is simply impossible to expose what triggered a reconcile. It also doesn't really fit with the current design e.g. because we de-duplicate objects in the reconcile queue. I.e. if you have multiple update events ~ at the same time on the same object it won't get reconciled for every single event.
If I would want to implement this I would implement a regular controller for the regular case and then just simply start a go routine with a ticker to periodically list and reconcile all objects.
@sbueringer you mentioned the documentation is sparse on this topic; is there any I can go to for an idea on how specifically controller-runtime handles this now? It would help us in our implementation as well as give us a point to add to them based on information gaps.
To be honest, not sure if we have any documentation about it apart from godocs across the code base
I can look through the code then. Any pointers to which section (I'm obviously new to it lol)
A good starting point is probably here: https://github.com/kubernetes-sigs/controller-runtime/blob/e644e50f0d1f2475f544543b9517bc8f0356326f/pkg/builder/controller.go#L227
There you can see how a controller is wired together.
Another interesting point is what happens on manager start: https://github.com/kubernetes-sigs/controller-runtime/blob/e59161ee8f41b2f24953419533dca84d30262c21/pkg/manager/internal.go#L318
From there it goes pretty deep down the rabbit hole until client-go primitives like the shared index informer.
It might be easies to step through there with a debugger to see how everything works at runtime.
Sounds good, thanks!
Okay, I'm back with a working solution. For full context, my team makes extensive use of the status property to set and manage the state of CRs. For that reason we implemented predicates to filter events in which the status property was modified, and as mentioned above this also filters out events triggered by the SyncPeriod.
One property we set in the status is LastSyncTime, a timestamp indicating when the CR was last reconciled. With this info we were able to implement a custom predicate function that compares the LastSyncTime with the current time of a triggering event. If the delta between the two is larger than the SyncPeriod, then we return true to enqueue the object for reconciliation. Example below. I'd be interested to know if this is a recommended alternative if developers need to support both filtered events and a global sync ticker.
// resync returns true if the specified resource has a status indicating that
// it has not been reconciled for at least the global constant SyncPeriod.
func resync(object client.Object) bool {
objectType := reflect.TypeOf(object)
// our custom CR with extended status property
objectWithStatusInterface := reflect.TypeOf((*commonv1.ObjectWithStatus)(nil)).Elem()
if !objectType.Implements(objectWithStatusInterface) {
return false
}
objectWithStatus := object.(commonv1.ObjectWithStatus)
lastSyncTime := test.ExtractLastSyncTime(objectWithStatus)
syncPeriod := int64(SyncPeriod.Seconds())
now := time.Now().Unix()
return lastSyncTime >= 0 && now-lastSyncTime >= syncPeriod
}
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
@jjchambl sounds reasonable to me. An alternative could be to return requeueafter
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
The behavior described is expected and not a bug.
Every SyncPeriod the underlying informers will emit events to trigger reconciliation of all objects. But if there are predicates configured that filter out those events, accordingly the Reconcile func won't be executed.
If nothing else, I'd say this is a documentation bug. I'd expect a "generic" event to be produced for each item in the cache when a periodic resync happens; I wouldn't expect an "update" event... and that's the only way this could happen, since GenerationChangedPredicate only overrides the Update func, and the default for the other funcs is to always return true (meaning the event should pass through the filter into the reconciler's work queue). Looks like the client-go code calls handler.OnUpdate() in the case of a Sync event, since the ResourceEventHandler interface they define only contains methods to handle add, update, and delete (not sync):
https://github.com/kubernetes/client-go/blob/4536e5a391f80fabd85f6425e508e34d9c1b853a/tools/cache/controller.go#L553-L558
Since this is such a common predicate for people to use, and there have been at least two GitHub issues opened for this surprising behavior, I think this warrants a small note in its docstring. Maybe something like, "Note that this predicate will filter out events produced as the result of a periodic resync."
Sounds good. Feel free to open a PR to improve the godoc
As far as I'm aware there is no way to filter based on the "sync trigger".
True - there's not an explicit attribute that holds this info. However, I sat down and looked at the "update" events that come in when these batches of resync-triggered events happen, and I compared those to events from actual update operations - I noticed that the resource version changes for every real update, but doesn't for the resync-triggered events (as you'd expect). So, to detect these cases while also making use of the generation change check, all I ended up doing was writing a custom predicate that is a logical OR of checking:
- if the generation changed between the old and new object
- if the resource version remained the same between the old and new object
type MyKindCustomPredicate struct {
// Returns true by default for all events; see overrides below.
predicate.Funcs
}
func (p MyKindCustomPredicate) Update(e event.UpdateEvent) bool {
if e.ObjectOld == nil || e.ObjectNew == nil {
return false
}
// Process the event if the generation changed (which happens e.g. if the spec
// was updated, but not if the status or metadata fields were updated).
if e.ObjectOld.GetGeneration() != e.ObjectNew.GetGeneration() {
return true
}
// If the generation is unchanged, we generally don't want to process the event,
// with the exception of events triggered by periodic resyncs (which are, for some
// reason, categorized as updates). We identify such events by checking if the old
// and new resource versions are equal.
return e.ObjectOld.GetResourceVersion() == e.ObjectNew.GetResourceVersion()
}
Assuming there are no other edge cases where the resource version remains the same across an update event, I'm guessing this is how most folks would expect GenerationChangedPredicate to behave for updates, i.e. to not filter out the resync events.
@sbueringer what are your thoughts on potentially changing GenerationChangedPredicate to include this check?
Hm. If I would be using the GenerationChangedPredicate I would expect that I only get events where the generation actually changed to be honest.
Fair point, but I'd also expect it to not have the side effect of filtering out periodic sync events, which is why this issue and #1990 were filed. I personally think this surprising behavior, paired with the lack of documentation, affects enough people without them realizing it that it's worth fixing.
I think this becomes especially confusing when you notice that ResourceVersionChangedPredicate exists - folks assume that its sole purpose is to filter out the periodic sync events (e.g. this comment in #1740), which seems to imply that other predicates don't filter out those events... but they do.
If we don't want to introduce a behavioral change by default, it might be worth adding a field to the other exported predicates aside from ResourceVersionChangedPredicate (i.e. GenerationChangedPredicate, AnnotationChangedPredicate, and also LabelChangedPredicate even though it looks like someone forgot to include it in the group interface compliance checks at the top there). This field could allow users to specify that they want to not filter out the periodic sync events, e.g. the lines here would be changed to:
type TypedGenerationChangedPredicate[object metav1.Object] struct {
TypedFuncs[object]
+
+ AllowSyncEvents bool
}
// Update implements default UpdateEvent filter for validating generation change.
- func (TypedGenerationChangedPredicate[object]) Update(e event.TypedUpdateEvent[object]) bool {
+ func (p TypedGenerationChangedPredicate[object]) Update(e event.TypedUpdateEvent[object]) bool {
if isNil(e.ObjectOld) {
log.Error(nil, "Update event has no old object to update", "event", e)
return false
}
if isNil(e.ObjectNew) {
log.Error(nil, "Update event has no new object for update", "event", e)
return false
}
- return e.ObjectNew.GetGeneration() != e.ObjectOld.GetGeneration()
+ if e.ObjectNew.GetGeneration() != e.ObjectOld.GetGeneration() {
+ return true
+ }
+
+ if p.AllowSyncEvents && e.ObjectNew.GetResourceVersion() == e.ObjectOld.GetResourceVersion() {
+ return true
+ }
+ return false
}
This would be backward-compatible with the existing predicate call sites, as the default value for that field would be false. WDYT?
In general I would prefer "composition" of predicates over adding behaviors like this to (all?) existing predicates.
Those utils don't exist in controller-runtimet today. But in CAPI we have All ("and") and Any ("or") predicates: https://github.com/kubernetes-sigs/cluster-api/blob/3df14955d48120d2783f58c5f063c8aa4834ed02/util/predicates/generic_predicates.go
@alvaroaleman @vincepri opinions?
Those utils don't exist in controller-runtimet today. But in CAPI we have All ("and") and Any ("or") predicates:
We actually do have And and Or.
Happy to add documentation for this but I agree with @sbueringer that we shouldn't do this inside the GenerationChangedPredicate.
I initially implemented my solution above as GenerationChangedPredicate combined via predicate.Or() with a custom predicate that only did the ResourceVersion check. However, a colleague brought up something I thought was interesting regarding logically And/Oring several predicates together -- at the given level of granularity (i.e. the entire predicate struct), you can't specify which event type from each predicate you want to Or together into the resulting predicate. As an example: if you add a predicate to that Or'd set in the future, and it performs filtering on any other event type (Create, Delete, or Generic), you'd lose that predicate's filtering logic because the other predicates in the set always return true for those event types. So yes, for the given use cases, Oring them together is less work. But, I ended up refactoring my solution to my snippet above to make the code easier to reason about and potentially change the behavior of.
In general, I do agree with preferring a composition-based approach. My solution above could be rewritten to type-embed GenerationChangedPredicate, e.g. something like below (I didn't include generics templating, for brevity), but I opted to just copy the existing logic because... it's a single field equality check.
type MyKindPredicate struct {
GenerationChangedPredicate
}
func (p MyKindPredicate) Update(e event.UpdateEvent) bool {
// [...] do old/new nil checks here
if p.GenerationChangedPredicate.Update(e) {
return true
}
if e.ObjectNew.GetResourceVersion() == e.ObjectOld.GetResourceVersion() {
return true
}
return false
}
Regardless, either way, you currently still have to write your own custom predicate struct to accomplish this behavior that most folks would expect to be the default.
If we don't want to alter the existing predicate structs, maybe a different compromise would be for this package to offer an additional predicate named something like ResourceVersionUnchangedPredicate (as it's essentially the opposite of the existing ResourceVersionChangedPredicate), and document that it can be used as a way to not filter out periodic sync events, since an object's resource version is the same in both the old and new object for such "update" events. Then, people could get the desired behavior of not filtering out periodic sync events by combining this new predicate with GenerationChangedPredicate (or whatever they're wanting to use) using predicate.Or(), rather than having to write their own custom predicate structs:
GenerationChangedOrSyncEventPredicate := predicate.Or(
predicate.GenerationChangedPredicate,
predicate.ResourceVersionUnchangedPredicate,
)
Thoughts?
Gentle ping on the comment above.
I've been seeing more and more instances in my team's codebase (and another occurrence from the issue in the mention above this comment) where people weren't aware that this predicate would prevent processing events from periodic resyncs. If someone doesn't understand how the underlying resync events are generated (i.e. categorized as updates), which most people shouldn't have to know, then they unknowingly filter out those events. This behavior is almost never what you want.
Just saw another example in the wild today of this coming back to bite someone:
// ...
For(&mypackage.MyType{}, builder.WithPredicates(
predicate.GenerationChangedPredicate{},
predicate.Funcs{CreateFunc: func(event.CreateEvent) bool { return true }},
predicate.Funcs{UpdateFunc: func(event.UpdateEvent) bool { return true }},
)).
// ...
Some (understandably) mistaken person was trying to ensure that all updates get processed by the reconciler... they didn't realize that the GenerationChangedPredicate was returning false for events (unintuitively categorized as updates) triggered by periodic resyncs; so even if they added extra predicates to the list to try to return true for those resync events, the false result from GenerationChangedPredicate would be included in the AND operation of all the predicates' results and filter those events out.
The lesson I've learned from all this is that trying to combine entire predicates via AND/OR logic is prone to bugs, and devs are better off writing and using a single custom predicate object and defining its filtering logic for each operation type (e.g. the snippet in my comment above). But, if you're going to offer users predefined predicates, you should fix unexpected behavior/problems with those predicates that the users discover.
Please fix this. I've suggested two backward compatible ways to both fix this and make it more obvious what the default behavior of GenerationChangedPredicate is. My more recent suggestion, offering a new "GenerationChangedOrSyncEventPredicate" predicate, seems to avoid your concerns with my first suggestion. PTAL.