controller-runtime icon indicating copy to clipboard operation
controller-runtime copied to clipboard

🌱 Do not deduplicate warnings by default

Open dlipovetsky opened this issue 1 year ago • 12 comments

Controllers are long-running processes, and deduplication, as implemented, increases memory use.

With this change, duplicate warnings will appear in the log by default. However, this is safe, because Kubernetes rotates container logs by default.

If a specific controller sees many duplicate warnings, it can configure the handler to deduplicate them.

dlipovetsky avatar Sep 18 '24 16:09 dlipovetsky

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

k8s-ci-robot avatar Sep 18 '24 16:09 k8s-ci-robot

Absolutely in favor of this change.

The current default is a memory leak and I think the behavior is also questionable

/lgtm (in general, pending un-draft + potentially unit test changes/fixes)

/assign @alvaroaleman @vincepri

sbueringer avatar Sep 19 '24 10:09 sbueringer

LGTM label has been added.

Git tree hash: 9bc8e9a4c90d0b860976c2fffeff13e19c7e9c43

k8s-ci-robot avatar Sep 19 '24 10:09 k8s-ci-robot

The current default is a memory leak

It takes some memory but how is it a leak? The number of unique warnings should be pretty finite or not?

alvaroaleman avatar Sep 19 '24 13:09 alvaroaleman

The current default is a memory leak

It takes some memory but how is it a leak? The number of unique warnings should be pretty finite or not?

Good point. Is the number of unique warnings finite? (i.e. is there a guarantee that they never contain variable parts like object names?)

(IIRC every webhook is able to return warnings, so we have more than just the warnings built into the kube-apiserver to consider)

But independent of that part, I think it's surprising behavior that we log every unique warning only once.

I mean I wonder in general who notices these warnings in controller logs, but only logging them once doesn't make it better :)

sbueringer avatar Sep 19 '24 15:09 sbueringer

I mean I wonder in general who notices these warnings in controller logs, but only logging them once doesn't make it better :)

Yeah I am a bit thorn on this. On the one hand if this was net new I would keep them on because as you said, its easy to miss them if they are only logged once. On the other you can make the argument that this is a silently breaking change and ppl might complain about now getting too many warnings that they don't care about logged.

alvaroaleman avatar Sep 19 '24 15:09 alvaroaleman

Yup. We can also consider defaulting to no warning handler. Then people can add it if they want

sbueringer avatar Sep 19 '24 16:09 sbueringer

This came out a discussion in a cluster-api PR: https://github.com/kubernetes-sigs/cluster-api/pull/11179#discussion_r1763420636

I think deduplication can make it harder to spot warnings. This is what I said there:

If the client makes two identical API calls hours apart, and both return the same warning, I would expect to find both in the log.

Even worse is that, with the current implementation, if the warning is logged, and that part of the log is overwritten, then evidence of the warning would disappear, because the warning would never be logged again, for as long as the process runs.

dlipovetsky avatar Sep 19 '24 18:09 dlipovetsky

Good point. Is the number of unique warnings finite? (i.e. is there a guarantee that they never contain variable parts like object names?)

~The "message" is the deduplication key here. And the message does include the object name, at least in the warnings I have seen.~ For example: https://github.com/kubernetes/kubernetes/blob/432e8937cfb1965f863b8f841af61a8cad7db965/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L89

(The above was incorrect)

dlipovetsky avatar Sep 19 '24 18:09 dlipovetsky

https://github.com/kubernetes/kubernetes/blob/432e8937cfb1965f863b8f841af61a8cad7db965/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L89

From what I can see there it only includes the fieldpath and the value but not object name or am I missing it?

alvaroaleman avatar Sep 19 '24 18:09 alvaroaleman

https://github.com/kubernetes/kubernetes/blob/432e8937cfb1965f863b8f841af61a8cad7db965/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L89

From what I can see there it only includes the fieldpath and the value but not object name or am I missing it?

My mistake! You're right, the object name is not part of the message.

dlipovetsky avatar Sep 19 '24 22:09 dlipovetsky

To recap, a warning looks like this:

[KubeAPIWarningLogger] metadata.finalizers: "cluster.cluster.x-k8s.io": prefer a domain-qualified finalizer name to avoid accidental conflicts with other finalizer writers

In this warning, the field-path is metadata.finalizers, and "cluster.cluster.x-k8s.io" is the group-kind (no version).

Currently, warnings are de-duplicated by default, so only onewarning per field-path/group-kind combination shows up in the log for the lifetime of the manager process.

Reasons to stop de-duplicating by default:

  • Easy to miss warnings if they are logged only once.

Reasons to continue de-duplication by default:

  • Fewer warnings in the log, i.e. a less "noisy" log.

If there's no clear winner, then we can keep the existing behavior, and I'll close the PR.

dlipovetsky avatar Oct 29 '24 21:10 dlipovetsky

I don't like the deduplication per default because one webhook in your cluster which returns non-deterministic warnings on the wrong resource is enough to ensure (probably) all CR-based comtrollers in the cluster go OOM.

As the benefit of deduplication is not that great, I don't think we should do it per default.

sbueringer avatar Oct 31 '24 06:10 sbueringer

Sounds good.

We can try to decrease the impact of the change. Some ideas:

  1. Write a user-friendly release note that explains how to enable de-duplication when creating the client.
  2. Announce the change in the notes of the next minor release, but only merge the change afterward, so that the default changes in two minor releases from now.

dlipovetsky avatar Oct 31 '24 17:10 dlipovetsky

I would be fine with only 1. I think it's not a problematic change to disable the deduplication. I would be really surprised if a significant number of CR users a) know that we log warnings and that we de-duplicate them b) depend on this behavior in any way.

But let's see what @vincepri and @alvaroaleman think.

sbueringer avatar Nov 01 '24 14:11 sbueringer

But let's see what @vincepri and @alvaroaleman think.

Yeah, I am ok with changing it, too. Lets maybe mark this change as breaking since it changes behavior so that it shows up more prominently on the release notes, but other than that, I am fine with this.

alvaroaleman avatar Nov 05 '24 16:11 alvaroaleman

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, dlipovetsky

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Nov 05 '24 16:11 k8s-ci-robot