cert-manager
cert-manager copied to clipboard
mismatched between certificate and secret can DOS Kubernetes
Describe the bug:
If you change the issuer for a certificate, and the secret is otherwise valid, cert-manager gets stuck in loop of checking the existing secret, and creating a new certificaterequest. The certificate request completes successfully, but the secret is never update.
The effect of this is interesting:
- in the case where one cluster had 15K, cert-manager was being OOM killed
- in another case where cert-manager had more memory, 48K CSRs were created bringing Kubernetes to its knees.
When this condition happens, the only way to clear the problem is to delete the cert and the secret and in the case of the OOM kill, manually delete the CSRs.
Expected behaviour:
Changing the issuer should either be immutable or the secret should be replaced.
Anything else we need to know?:
I0202 20:04:28.615870 1 secret_manager.go:94] "cert-manager/certificates-issuing: applying Secret data" key="system/hostcluster-certs" resource_name="hostcluster-certs" resource_namespace="system" resource_kind="Certificate" resource_version="v1" secret="hostcluster-certs" message="missing base label controller.cert-manager.io/fao"
I0202 20:04:28.618415 1 trigger_controller.go:194] "cert-manager/certificates-trigger: Certificate must be re-issued" key="system/hostcluster-certs" reason="IncorrectIssuer" message="Issuing certificate as Secret was previously issued by Issuer.cert-manager.io/<redacted>"
Environment details::
- Kubernetes version: 1.26
- Cloud-provider/provisioner: GKE
- cert-manager version: 1.13
- Install method: e.g. helm/static manifests: helm
/kind bug
@darkmuggle would it be possible to provide a minimal reproducible example?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
/close
@cert-manager-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen. Mark the issue as fresh with/remove-lifecycle rotten. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.