controller-runtime icon indicating copy to clipboard operation
controller-runtime copied to clipboard

Logging the reason for a reconcile

Open austince opened this issue 2 years ago • 5 comments

Hey there, is there any mechanism to log which event caused a reconcile request, or generally some other tooling to figure this out?

We have a controller that owns some resources directly, references others it does not own, and has a custom source for external triggers. All these different sources make it difficult to understand why a given reconcile request was triggered, especially in cases when trying to debug over-active reconcilers.

austince avatar Sep 08 '22 19:09 austince

Not really, the problem is that the reconcile.Request get put into a workqueue that does de-duplication, so custom information like that can not be preserved (or at least not with the current workqueue implementation).

The best you can do is add logging to your eventsources and/or handlers.

alvaroaleman avatar Sep 08 '22 21:09 alvaroaleman

That makes sense, thanks for the reply. I think adding logging to the handlers seems like a good approach.

One complication I could see is that the builder doesn't expose ways to construct the handlers unless you use the lower-level Watches(..) method (i.e., not Owns(..) or For(..)).

For https://github.com/kubernetes-sigs/controller-runtime/blob/b93b5f92794b9383427995678d22ddac396dba13/pkg/builder/controller.go#L222-L227

Owns https://github.com/kubernetes-sigs/controller-runtime/blob/b93b5f92794b9383427995678d22ddac396dba13/pkg/builder/controller.go#L240-L243

What do you think about adding an option to modify the handler that is used?

I could see something like:

func (b *Builder) WithHandlerWrapper(func(handler.EventHandler) handler.EventHandler) *Builder { 
// ...
}

var _ ForOption = &HandlerWrapper{}
var _ OwnsOption = &HandlerWrapper{}

Then we could wrap the handler with a handler that just logs and delegates.

What do you think?

austince avatar Sep 08 '22 22:09 austince

Hey @alvaroaleman , hope you don't mind the ping. Just wondering if you have thoughts on this proposal / would be able to review a PR in that direction.

austince avatar Oct 12 '22 15:10 austince

Just use watches and wrap the handler there, no reason to change the builder

alvaroaleman avatar Oct 13 '22 12:10 alvaroaleman

I think that's a decent workaround but forces us to reimplement the special setup for Owns and For. Do you have more concerns other than it's more to maintain?

austince avatar Oct 13 '22 18:10 austince

Hey @austince, have you gotten a chance to look more into this? We are currently in the same boat and want to see why some of our reconciliation loops start in the first place, due to some odd inconsistencies.

If you have any insight I would be glad to take it :)

anderssonw avatar Dec 21 '22 09:12 anderssonw

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 21 '23 18:03 k8s-triage-robot

Bump @austince :)

anderssonw avatar Mar 22 '23 09:03 anderssonw

/remove-lifecycle stale

anderssonw avatar Mar 22 '23 09:03 anderssonw

This is the recommended approach: https://github.com/kubernetes-sigs/controller-runtime/issues/1997#issuecomment-1277540770

austince avatar Mar 22 '23 14:03 austince

Hmm, I'll peep once i get time to prioritise it :)

anderssonw avatar Mar 23 '23 07:03 anderssonw

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 21 '23 08:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 21 '23 08:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Feb 18 '24 17:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 18 '24 17:02 k8s-ci-robot