cert-manager icon indicating copy to clipboard operation
cert-manager copied to clipboard

`revisionHistoryLimit` default of `nil` should be changed to ...

Open hawksight opened this issue 2 years ago • 9 comments

Is your feature request related to a problem? Please describe.

By default cert-manager will keep all CertificateRequest resources, which can add up on long lived and / or busy clusters. Whilst this can be changed per Certificate resource, it is often skipped or missed. So in multi tenanted environments, you often have to set defaults using other tooling or add policy with approver-policy to enforce a reasonable value. This requires more work from platform teams.

Another impact is that of metrics. If adding say CertificateRequest name as labels to metrics, the number of metrics could explode and continue to grow if the default nil is left in place. Relates to discussions around https://github.com/cert-manager/cert-manager/pull/6415.

There is also an impact on ETCD which has to store all this information indefinitely with the default value.

Describe the solution you'd like

I think we should have a new default going forwards to limit the number of resource we leave in cluster.

Potentially I would have this configurable at installation time to allow a platform operator to decide a sensible value. We could also change our default value to be sensible in ~90% of use cases.

Describe alternatives you've considered

  • Setting on every Certificate resource: Costly in maintenance of all the YAML.
  • Setting a new value with a Webhook type tool: May require additional tooling and or expertise to implement

Additional context

Environment details (remove if not applicable):

  • Kubernetes version: ALL
  • Cloud-provider/provisioner: ALL
  • cert-manager version: ALL
  • Install method: ALL

/kind feature

/cc @maelvls

hawksight avatar Oct 17 '23 10:10 hawksight

I would appreciate this as well. It would be great to set a custom default limit and still be able to override the default on Certificate level.

robasp avatar Oct 26 '23 06:10 robasp

This was discussed in open-source standup relating to this work on metrics, as the default nil implication means that default metrics could grow indefinitely.

Also for reference the slack discussion

hawksight avatar Nov 03 '23 13:11 hawksight

I had a quick search for issues, and this issue at least duplicates https://github.com/cert-manager/cert-manager/issues/3958 and https://github.com/cert-manager/cert-manager/issues/4071, but maybe more. I think it's evident that a sane default would be appreciated by cert-manager users. 👨‍🎤

erikgb avatar Nov 16 '23 19:11 erikgb

Yikes I did not search for existing issues, which is my bad :facepalm: The fact there are two other issues for the same thing does give us some positive reinforcement, that there are others who would benefit from making this change.

After talking to @maelvls - we decided that we want to make this change. In order to check we haven't missed anything I am going to post into the #cert-manager slack to see if anyone has an actual breaking use case if this were to become 1 instead of the current nil. We can leave that for some time, a week or two.. and direct people here to detail issues. I'll do that shortly.

hawksight avatar Nov 23 '23 13:11 hawksight

Original message in slack: here

I popped a reminder in today, but currently 3 thumbs up and no objections -> so unless there's a major blocker comes up I'd say we are good to implement this.

@maelvls In the new year (I have hols incoming) I might have an actual stab at implementing this change. But I might need some guidance :-).

hawksight avatar Dec 06 '23 14:12 hawksight

in case you still feel a risk breaking some peoples setup, set the default to a sane value but make the default configurable in a global manner so that someone can change it to 3 or nil or anything he wants in one central place while still being able to set it on each certificate/ingress to override that default one.

savar avatar Jan 09 '24 15:01 savar

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. /lifecycle stale

jetstack-bot avatar Apr 08 '24 15:04 jetstack-bot

@hawksight Is this implemented now - so we can close this issue?

/remove-lifecycle stale

erikgb avatar Apr 08 '24 15:04 erikgb

It is not yet implemented, at least not by me :)

hawksight avatar Apr 12 '24 13:04 hawksight

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. /lifecycle stale

cert-manager-bot avatar Jul 11 '24 14:07 cert-manager-bot

/remove-lifecycle stale

hawksight avatar Jul 15 '24 10:07 hawksight

/priority important-longterm

ThatsMrTalbot avatar Aug 29 '24 10:08 ThatsMrTalbot