cert-manager icon indicating copy to clipboard operation
cert-manager copied to clipboard

mismatched between certificate and secret can DOS Kubernetes

Open darkmuggle opened this issue 1 year ago • 2 comments

Describe the bug:

If you change the issuer for a certificate, and the secret is otherwise valid, cert-manager gets stuck in loop of checking the existing secret, and creating a new certificaterequest. The certificate request completes successfully, but the secret is never update.

The effect of this is interesting:

  • in the case where one cluster had 15K, cert-manager was being OOM killed
  • in another case where cert-manager had more memory, 48K CSRs were created bringing Kubernetes to its knees.

When this condition happens, the only way to clear the problem is to delete the cert and the secret and in the case of the OOM kill, manually delete the CSRs.

Expected behaviour:

Changing the issuer should either be immutable or the secret should be replaced.

Anything else we need to know?:

I0202 20:04:28.615870       1 secret_manager.go:94] "cert-manager/certificates-issuing: applying Secret data" key="system/hostcluster-certs" resource_name="hostcluster-certs" resource_namespace="system" resource_kind="Certificate" resource_version="v1" secret="hostcluster-certs" message="missing base label controller.cert-manager.io/fao"
I0202 20:04:28.618415       1 trigger_controller.go:194] "cert-manager/certificates-trigger: Certificate must be re-issued" key="system/hostcluster-certs" reason="IncorrectIssuer" message="Issuing certificate as Secret was previously issued by Issuer.cert-manager.io/<redacted>"

Environment details::

  • Kubernetes version: 1.26
  • Cloud-provider/provisioner: GKE
  • cert-manager version: 1.13
  • Install method: e.g. helm/static manifests: helm

/kind bug

darkmuggle avatar Feb 02 '24 20:02 darkmuggle

@darkmuggle would it be possible to provide a minimal reproducible example?

inteon avatar Feb 03 '24 09:02 inteon

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. /lifecycle stale

cert-manager-bot avatar May 03 '24 10:05 cert-manager-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. /lifecycle rotten /remove-lifecycle stale

cert-manager-bot avatar Jun 02 '24 10:06 cert-manager-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten. /close

cert-manager-bot avatar Jul 02 '24 11:07 cert-manager-bot

@cert-manager-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

cert-manager-prow[bot] avatar Jul 02 '24 11:07 cert-manager-prow[bot]