cert-manager icon indicating copy to clipboard operation
cert-manager copied to clipboard

Wildcard certificates not being resolved correctly.

Open jaxxstorm opened this issue 4 years ago • 16 comments

I am trying to generate a wildcard certificate using digitalocean. The zone lookup is splitting the entry incorrectly.

Describe the bug:

I have a certificate like so:

apiVersion: cert-manager.io/v1
kind: Certificate
spec:
  dnsNames:
  - '*.example.com'
  issuerRef:
    kind: ClusterIssuer
    name: letsencrypt
  secretName: wildcard-cert

Checking the logs, I can see that the call to the digitalocean domain is trying to lookup .com. domain, rather than example.com:

E0405 04:13:24.562351       1 controller.go:158] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"15075962-3845-4d2d-84a4-c463a5604d36\") Resource not found" "key"="platform/wildcard-cert-99rvk-1536210814-2979517128"
I0405 04:13:24.562844       1 wait.go:329] Returning cached zone record "com." for fqdn "_acme-challenge.example.com."

It looks like just a string split issue

Expected behaviour: A certificate is issued

Steps to reproduce the bug:

Anything else we need to know?:

Environment details::

  • Kubernetes version: 1.20.0
  • Cloud-provider/provisioner: DigitalOcean
  • cert-manager version: v1.2.0
  • Install method: e.g. helm/static manifests Pulumi + Helm Chart

/kind bug

jaxxstorm avatar Apr 05 '21 04:04 jaxxstorm

/priority important-soon

irbekrm avatar Apr 15 '21 12:04 irbekrm

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle stale

jetstack-bot avatar Sep 16 '21 05:09 jetstack-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle rotten /remove-lifecycle stale

jetstack-bot avatar Oct 16 '21 05:10 jetstack-bot

I am seeing the same issue. However, the end result is even more problematic than suggested above, because cert-manager keeps retrying the same API call every ~300 ms without stop (even after receiving a HTTP 429 response), up to the point where the API server starts blocking the source IP and forwards the requests to CloudFlare:

I1110 19:31:11.626023       1 setup.go:202] cert-manager/controller/clusterissuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-prod-private-key" "related_resource_namespace"="kube-system" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-prod" "resource_namespace"="" "resource_version"="v1"
E1110 19:31:12.092289       1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"2e32a835-58f9-47e2-9d6f-5fadd5b09477\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:12.299203       1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"dfd79895-a9bb-4ae9-84a8-33234effeff1\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:12.579537       1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"6ea429a5-c373-4a45-94de-67ae4b44c8ed\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:12.774892       1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"6fd05b1d-29c1-4139-846a-507b7d11458d\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:12.932960       1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"5c0bbbf1-f9c4-4169-ac42-a10eff166886\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:13.060258       1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"35311189-97e6-4b72-a309-71b9dc1ec1ed\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:13.138893       1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 429 (request \"1eb0a2ac-bcc4-417e-8ca3-219f31d86602\") Too many requests" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"

sdudley avatar Nov 10 '21 21:11 sdudley

/remove-lifecycle rotten

sdudley avatar Nov 10 '21 21:11 sdudley

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle stale

jetstack-bot avatar Feb 08 '22 22:02 jetstack-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle rotten /remove-lifecycle stale

jetstack-bot avatar Mar 10 '22 22:03 jetstack-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten. Send feedback to jetstack. /close

jetstack-bot avatar Apr 09 '22 23:04 jetstack-bot

@jetstack-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten. Send feedback to jetstack. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jetstack-bot avatar Apr 09 '22 23:04 jetstack-bot

Issue is still present /reopen

jfcoz avatar Oct 23 '25 10:10 jfcoz

@jfcoz: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

Issue is still present /reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

cert-manager-prow[bot] avatar Oct 23 '25 10:10 cert-manager-prow[bot]

/reopen

@jfcoz, this issue was reported on an old version of cert-manager. Are you able to provide some updated information?

erikgb avatar Oct 23 '25 10:10 erikgb

@erikgb: Reopened this issue.

In response to this:

/reopen

@jfcoz, this issue was reported on an old version of cert-manager. Are you able to provide some updated information?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

cert-manager-prow[bot] avatar Oct 23 '25 10:10 cert-manager-prow[bot]

@erikgb , yes, on 1.19.1

Same problem of only the TLD in the DO API query :

E1023 09:38:12.407582       1 controller.go:157] "re-queuing item due to error processing" err="GET https://api.digitalocean.com/v2/domains/rocks/records?type=TXT: 404 (request \"2fe5952c-1ed9-4ec7-830b-fc16fc654531\") Resource not found" logger="cert-manager.controller"

jfcoz avatar Oct 23 '25 12:10 jfcoz

I’ve add theses options from https://github.com/cert-manager/cert-manager/issues/5917#issue-1655621950 and it now works correctly : 

        - --dns01-recursive-nameservers-only
        - --dns01-recursive-nameservers=8.8.8.8:53

We suspect that,

  • as DigitalOcean LoadBalancer ipMode is not implemented yet ( https://github.com/digitalocean/digitalocean-cloud-controller-manager/issues/811 ),
  • we have used the hairpin proxy, which is adding rewrites in coredns config to a local service for each ingress domain
  • This is causing an issue with DNS challenges :
    • https://github.com/compumike/hairpin-proxy/issues/10
    • and https://github.com/cert-manager/cert-manager/discussions/3749

I will re-ask DigitalOcean to implement the load balancer ipMode, this will allow us to remove all theses workarounds : hairpin-proxy, and the dns01 recursive options.

jfcoz avatar Oct 24 '25 07:10 jfcoz

📢 A new pre-release is available which is related to this issue. It fixes the unregulated DigitalOcean retries for errors.

  • https://github.com/cert-manager/cert-manager/releases/tag/v1.20.0-alpha.0

Please test and report back.

wallrj-cyberark avatar Nov 04 '25 16:11 wallrj-cyberark