Wildcard certificates not being resolved correctly.
I am trying to generate a wildcard certificate using digitalocean. The zone lookup is splitting the entry incorrectly.
Describe the bug:
I have a certificate like so:
apiVersion: cert-manager.io/v1
kind: Certificate
spec:
dnsNames:
- '*.example.com'
issuerRef:
kind: ClusterIssuer
name: letsencrypt
secretName: wildcard-cert
Checking the logs, I can see that the call to the digitalocean domain is trying to lookup .com. domain, rather than example.com:
E0405 04:13:24.562351 1 controller.go:158] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"15075962-3845-4d2d-84a4-c463a5604d36\") Resource not found" "key"="platform/wildcard-cert-99rvk-1536210814-2979517128"
I0405 04:13:24.562844 1 wait.go:329] Returning cached zone record "com." for fqdn "_acme-challenge.example.com."
It looks like just a string split issue
Expected behaviour: A certificate is issued
Steps to reproduce the bug:
Anything else we need to know?:
Environment details::
- Kubernetes version: 1.20.0
- Cloud-provider/provisioner: DigitalOcean
- cert-manager version: v1.2.0
- Install method: e.g. helm/static manifests Pulumi + Helm Chart
/kind bug
/priority important-soon
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale
I am seeing the same issue. However, the end result is even more problematic than suggested above, because cert-manager keeps retrying the same API call every ~300 ms without stop (even after receiving a HTTP 429 response), up to the point where the API server starts blocking the source IP and forwards the requests to CloudFlare:
I1110 19:31:11.626023 1 setup.go:202] cert-manager/controller/clusterissuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="letsencrypt-prod-private-key" "related_resource_namespace"="kube-system" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-prod" "resource_namespace"="" "resource_version"="v1"
E1110 19:31:12.092289 1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"2e32a835-58f9-47e2-9d6f-5fadd5b09477\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:12.299203 1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"dfd79895-a9bb-4ae9-84a8-33234effeff1\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:12.579537 1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"6ea429a5-c373-4a45-94de-67ae4b44c8ed\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:12.774892 1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"6fd05b1d-29c1-4139-846a-507b7d11458d\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:12.932960 1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"5c0bbbf1-f9c4-4169-ac42-a10eff166886\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:13.060258 1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 404 (request \"35311189-97e6-4b72-a309-71b9dc1ec1ed\") Resource not found" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
E1110 19:31:13.138893 1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="GET https://api.digitalocean.com/v2/domains/com/records: 429 (request \"1eb0a2ac-bcc4-417e-8ca3-219f31d86602\") Too many requests" "key"="mynamespace/company-wildcard-tls-prod-wqwsz-434997518-4173896995"
/remove-lifecycle rotten
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to jetstack.
/close
@jetstack-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen. Mark the issue as fresh with/remove-lifecycle rotten. Send feedback to jetstack. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Issue is still present /reopen
@jfcoz: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
Issue is still present /reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/reopen
@jfcoz, this issue was reported on an old version of cert-manager. Are you able to provide some updated information?
@erikgb: Reopened this issue.
In response to this:
/reopen
@jfcoz, this issue was reported on an old version of cert-manager. Are you able to provide some updated information?
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
@erikgb , yes, on 1.19.1
Same problem of only the TLD in the DO API query :
E1023 09:38:12.407582 1 controller.go:157] "re-queuing item due to error processing" err="GET https://api.digitalocean.com/v2/domains/rocks/records?type=TXT: 404 (request \"2fe5952c-1ed9-4ec7-830b-fc16fc654531\") Resource not found" logger="cert-manager.controller"
I’ve add theses options from https://github.com/cert-manager/cert-manager/issues/5917#issue-1655621950 and it now works correctly :
- --dns01-recursive-nameservers-only
- --dns01-recursive-nameservers=8.8.8.8:53
We suspect that,
- as DigitalOcean LoadBalancer ipMode is not implemented yet ( https://github.com/digitalocean/digitalocean-cloud-controller-manager/issues/811 ),
- we have used the hairpin proxy, which is adding rewrites in coredns config to a local service for each ingress domain
- This is causing an issue with DNS challenges :
- https://github.com/compumike/hairpin-proxy/issues/10
- and https://github.com/cert-manager/cert-manager/discussions/3749
I will re-ask DigitalOcean to implement the load balancer ipMode, this will allow us to remove all theses workarounds : hairpin-proxy, and the dns01 recursive options.
📢 A new pre-release is available which is related to this issue. It fixes the unregulated DigitalOcean retries for errors.
- https://github.com/cert-manager/cert-manager/releases/tag/v1.20.0-alpha.0
Please test and report back.