`revisionHistoryLimit` default of `nil` should be changed to ...
Is your feature request related to a problem? Please describe.
By default cert-manager will keep all CertificateRequest resources, which can add up on long lived and / or busy clusters.
Whilst this can be changed per Certificate resource, it is often skipped or missed. So in multi tenanted environments, you often have to set defaults using other tooling or add policy with approver-policy to enforce a reasonable value. This requires more work
from platform teams.
Another impact is that of metrics. If adding say CertificateRequest name as labels to metrics, the number of metrics could explode and continue to grow if the default nil is left in place. Relates to discussions around https://github.com/cert-manager/cert-manager/pull/6415.
There is also an impact on ETCD which has to store all this information indefinitely with the default value.
Describe the solution you'd like
I think we should have a new default going forwards to limit the number of resource we leave in cluster.
Potentially I would have this configurable at installation time to allow a platform operator to decide a sensible value. We could also change our default value to be sensible in ~90% of use cases.
Describe alternatives you've considered
- Setting on every
Certificateresource: Costly in maintenance of all the YAML. - Setting a new value with a Webhook type tool: May require additional tooling and or expertise to implement
Additional context
Environment details (remove if not applicable):
- Kubernetes version: ALL
- Cloud-provider/provisioner: ALL
- cert-manager version: ALL
- Install method: ALL
/kind feature
/cc @maelvls
I would appreciate this as well. It would be great to set a custom default limit and still be able to override the default on Certificate level.
This was discussed in open-source standup relating to this work on metrics, as the default nil implication means that default metrics could grow indefinitely.
Also for reference the slack discussion
I had a quick search for issues, and this issue at least duplicates https://github.com/cert-manager/cert-manager/issues/3958 and https://github.com/cert-manager/cert-manager/issues/4071, but maybe more. I think it's evident that a sane default would be appreciated by cert-manager users. 👨🎤
Yikes I did not search for existing issues, which is my bad :facepalm: The fact there are two other issues for the same thing does give us some positive reinforcement, that there are others who would benefit from making this change.
After talking to @maelvls - we decided that we want to make this change. In order to check we haven't missed anything I am going to post into the #cert-manager slack to see if anyone has an actual breaking use case if this were to become 1 instead of the current nil. We can leave that for some time, a week or two.. and direct people here to detail issues. I'll do that shortly.
Original message in slack: here
I popped a reminder in today, but currently 3 thumbs up and no objections -> so unless there's a major blocker comes up I'd say we are good to implement this.
@maelvls In the new year (I have hols incoming) I might have an actual stab at implementing this change. But I might need some guidance :-).
in case you still feel a risk breaking some peoples setup, set the default to a sane value but make the default configurable in a global manner so that someone can change it to 3 or nil or anything he wants in one central place while still being able to set it on each certificate/ingress to override that default one.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
@hawksight Is this implemented now - so we can close this issue?
/remove-lifecycle stale
It is not yet implemented, at least not by me :)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
/priority important-longterm