route53 icon indicating copy to clipboard operation
route53 copied to clipboard

Unable to pass DNS challenge with Caddy 2.8+

Open ozapotichnyi opened this issue 1 year ago • 17 comments

Wildcard DNS challenge stopped working after update to Caddy 2.8.

The minimum reproducible setup:

Caddy config:

{
  storage consul {
    prefix "caddytls"
  }
  admin :2019

  debug

  email [email protected]
}

*.example.com {
  log {
    format json
  }

  tls {
    dns route53
  }
}

Dockerfile:

FROM --platform=linux/amd64 caddy:2-builder-alpine@sha256:cdf3364f8cb02338b857728fdc0a9b8875b343996db347300bf2361db3da9094 AS builder

RUN xcaddy build \
    --with github.com/pteich/caddy-tlsconsul \
    --with github.com/caddy-dns/route53

FROM --platform=linux/amd64 caddy:2-alpine@sha256:a48e22edad925dc216fd27aa4f04ec49ebdad9b64c9e5a3f1826d0595ef2993c

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

Logs:

{"level":"info","ts":1717682068.1877885,"logger":"tls.obtain","msg":"lock acquired","identifier":"*.example.com"}
{"level":"info","ts":1717682068.1907144,"logger":"tls.obtain","msg":"obtaining certificate","identifier":"*.example.com"}
{"level":"debug","ts":1717682068.1908574,"logger":"events","msg":"event","name":"cert_obtaining","id":"60de8b42-ab04-4b13-9920-03713277aa4a","origin":"tls","data":{"identifier":"*.example.com"}}
{"level":"debug","ts":1717682068.1911874,"logger":"tls.obtain","msg":"trying issuer 1/1","issuer":"acme-v02.api.letsencrypt.org-directory"}
{"level":"debug","ts":1717682068.191264,"logger":"caddy.storage.consul","msg":"loading data from Consul for acme/acme-v02.api.letsencrypt.org-directory/users/[email protected]/caddy.json"}
{"level":"debug","ts":1717682068.1937697,"logger":"caddy.storage.consul","msg":"loading data from Consul for acme/acme-v02.api.letsencrypt.org-directory/users/[email protected]/caddy.key"}
{"level":"info","ts":1717682068.1980238,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["*.example.com"],"ca":"https://acme-v02.api.letsencrypt.org/directory","account":"[email protected]"}
{"level":"info","ts":1717682068.198052,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["*.example.com"],"ca":"https://acme-v02.api.letsencrypt.org/directory","account":"[email protected]"}
{"level":"info","ts":1717682068.1981454,"logger":"tls.issuance.acme","msg":"using ACME account","account_id":"https://acme-v02.api.letsencrypt.org/acme/acct/1763210887","account_contact":["mailto:[email protected]"]}
{"level":"debug","ts":1717682068.400449,"logger":"tls.issuance.acme.acme_client","msg":"http request","method":"GET","url":"https://acme-v02.api.letsencrypt.org/directory","headers":{"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]},"response_headers":{"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["746"],"Content-Type":["application/json"],"Date":["Thu, 06 Jun 2024 13:54:28 GMT"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]},"status_code":200}
{"level":"debug","ts":1717682068.400676,"logger":"tls.issuance.acme.acme_client","msg":"creating order","account":"https://acme-v02.api.letsencrypt.org/acme/acct/1763210887","identifiers":["*.example.com"]}
{"level":"debug","ts":1717682068.4561968,"logger":"tls.issuance.acme.acme_client","msg":"http request","method":"HEAD","url":"https://acme-v02.api.letsencrypt.org/acme/new-nonce","headers":{"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]},"response_headers":{"Cache-Control":["public, max-age=0, no-cache"],"Date":["Thu, 06 Jun 2024 13:54:28 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Replay-Nonce":["su1caOmbBxQwQu9hLgYH8tMvuXSY0yd8jUjEqWyqAihX7TMZGos"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]},"status_code":200}
{"level":"debug","ts":1717682068.5403905,"logger":"tls.issuance.acme.acme_client","msg":"http request","method":"POST","url":"https://acme-v02.api.letsencrypt.org/acme/new-order","headers":{"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]},"response_headers":{"Boulder-Requester":["1763210887"],"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["345"],"Content-Type":["application/json"],"Date":["Thu, 06 Jun 2024 13:54:28 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Location":["https://acme-v02.api.letsencrypt.org/acme/order/1763210887/275980118617"],"Replay-Nonce":["su1caOmb2AuTy7-eFJ7SHv1wOCyVgybSNdoJKeGjNcwOLeTGn7k"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]},"status_code":201}
{"level":"debug","ts":1717682068.600306,"logger":"tls.issuance.acme.acme_client","msg":"http request","method":"POST","url":"https://acme-v02.api.letsencrypt.org/acme/authz-v3/360439255817","headers":{"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]},"response_headers":{"Boulder-Requester":["1763210887"],"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["391"],"Content-Type":["application/json"],"Date":["Thu, 06 Jun 2024 13:54:28 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Replay-Nonce":["su1caOmbKx5cQpgNcP62Uc4bXmQr1rpUrDLGB9LmzmTeSj7AokU"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]},"status_code":200}
{"level":"info","ts":1717682068.6005263,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"*.example.com","challenge_type":"dns-01","ca":"https://acme-v02.api.letsencrypt.org/directory"}
{"level":"error","ts":1717682068.6307743,"logger":"tls.issuance.acme.acme_client","msg":"cleaning up solver","identifier":"*.example.com","challenge_type":"dns-01","error":"no memory of presenting a DNS record for \"_acme-challenge.example.com\" (usually OK if presenting also failed)"}
{"level":"debug","ts":1717682068.6949975,"logger":"tls.issuance.acme.acme_client","msg":"http request","method":"POST","url":"https://acme-v02.api.letsencrypt.org/acme/authz-v3/360439255817","headers":{"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]},"response_headers":{"Boulder-Requester":["1763210887"],"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["395"],"Content-Type":["application/json"],"Date":["Thu, 06 Jun 2024 13:54:28 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Replay-Nonce":["su1caOmbzRAm8TrBKvAcq-Lm4Xi-o3g5q22uZzpGo6jRk7hundE"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]},"status_code":200}
{"level":"error","ts":1717682068.696349,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"*.example.com","issuer":"acme-v02.api.letsencrypt.org-directory","error":"[*.example.com] solving challenges: presenting for challenge: adding temporary record for zone \"example.com.\": not found, ResolveEndpointV2 (order=https://acme-v02.api.letsencrypt.org/acme/order/1763210887/275980118617) (ca=https://acme-v02.api.letsencrypt.org/directory)"}
{"level":"debug","ts":1717682068.6964366,"logger":"events","msg":"event","name":"cert_failed","id":"8bf8efb3-0aa5-4e63-8478-33cf4bb9906a","origin":"tls","data":{"error":{},"identifier":"*.example.com","issuers":["acme-v02.api.letsencrypt.org-directory"],"renewal":false}}
{"level":"error","ts":1717682068.6964548,"logger":"tls.obtain","msg":"will retry","error":"[*.example.com] Obtain: [*.example.com] solving challenges: presenting for challenge: adding temporary record for zone \"example.com.\": not found, ResolveEndpointV2 (order=https://acme-v02.api.letsencrypt.org/acme/order/1763210887/275980118617) (ca=https://acme-v02.api.letsencrypt.org/directory)","attempt":1,"retrying_in":60,"elapsed":0.508639999,"max_duration":2592000}

Everything pass fine with Caddy 2.7.6.

Any suggestions are appreciated.

ozapotichnyi avatar Jun 06 '24 14:06 ozapotichnyi

Same issue here. I tried re-issuing my AWS keys, but AWS is reporting that they are "not used". I think for some reason it is not presenting the auth.

ryantiger658 avatar Jun 06 '24 16:06 ryantiger658

I am wondering if we just need to bump the caddy version since there were so many breaking changes

https://github.com/caddy-dns/route53/blob/8e49e7546771bf6846e1531dcaff4925af5ddcde/go.mod#L6

ryantiger658 avatar Jun 06 '24 17:06 ryantiger658

It looks like it is related to this issue: https://github.com/libdns/route53/issues/235#issue-2212746183

Which is related to this issue: https://github.com/aws/aws-sdk-go-v2/issues/2370#issuecomment-1953308268

ryantiger658 avatar Jun 06 '24 19:06 ryantiger658

Ran into the same issue with a single individual domain, not wildcard. The fix mentioned here that ryantiger685 mentions worked for me. Looks like PRs in that repository need to get merged to fix this officially.

Edit: Just tested wildcard and that's working with this fix as well.

kdevan avatar Jun 14 '24 17:06 kdevan

Just ran into this as well after upgrading Caddy to v2.8.4.

eth-limo avatar Jun 18 '24 14:06 eth-limo

Could you test this with the latest version and wait_for_propagation enabled?

{
  "module": "acme",
  "challenges": {
    "dns": {
      "provider": {
        "name": "route53",
        "wait_for_propagation": true,
      }
    }
  }
}

aymanbagabas avatar Jun 24 '24 18:06 aymanbagabas

FWIW, I'm using a Dockerfile to build https://github.com/lucaslorentz/caddy-docker-proxy with this plugin, and simply rebuilding the container with the latest release of this plugin and Caddy 2.8.4 was enough to solve the DNS challenge problem described in this thread, although I am not using a wildcard domain. I did not need to use the wait_for_propagation parameter.

checkerbomb avatar Jun 25 '24 17:06 checkerbomb

Could you test this with the latest version and wait_for_propagation enabled?

{
  "module": "acme",
  "challenges": {
    "dns": {
      "provider": {
        "name": "route53",
        "wait_for_propagation": true,
      }
    }
  }
}

Yes, this works! Just tested with a new domain. Feels good removing all the hacks :)

This may be unrelated but just to note, I did get a new error from Route 53: Invalid Configuration: Missing Region

I just added us-east-1 as the region value and the error went away and everything works! Just thought I'd mention that this parameter may be required now.

kdevan avatar Jun 27 '24 19:06 kdevan

~~Ah sorry, I spoke too soon.~~ The normal domain worked but the wildcard domain did not.

{
  "level": "error",
  "ts": 1719515037.2461495,
  "logger": "tls.obtain",
  "msg": "will retry",
  "error": "[*.stage.foo.bar.com] Obtain: [*.stage.foo.bar.com] solving challenges: presenting for challenge: adding temporary record for zone \"foo.bar.com.\": exceeded max wait time for ResourceRecordSetsChanged waiter (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/152473533/17457386443) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)",
  "attempt": 4,
  "retrying_in": 300,
  "elapsed": 546.902648806,
  "max_duration": 2592000
}

Edit:

I manually deleted the TXT record from Route 53, restarted Caddy, and the wildcard domain works! Not sure what happened here the first time but might just have been something on my end.

I saw that these two are the first errors which led me to do the extra troubleshooting:

{
  "level": "error",
  "ts": 1719514555.4299963,
  "logger": "tls.issuance.acme.acme_client",
  "msg": "cleaning up solver",
  "identifier": "stage.foo.bar.com",
  "challenge_type": "dns-01",
  "error": "deleting temporary record for name \"foo.bar.com.\" in zone {\"\" \"TXT\" \"_acme-challenge.stage\" \"wEz6Z5Ta1vy5Z9ebcVcfyZTmptaYdfc-QtYRA_wV6Bs\" \"0s\" '\\x00' '\\x00'}: exceeded max wait time for ResourceRecordSetsChanged waiter"
}
{
  "level": "error",
  "ts": 1719514643.3972101,
  "logger": "tls.issuance.acme.acme_client",
  "msg": "cleaning up solver",
  "identifier": "*.stage.foo.bar.com",
  "challenge_type": "dns-01",
  "error": "deleting temporary record for name \"foo.bar.com.\" in zone {\"\" \"TXT\" \"_acme-challenge.stage\" \"JvKk2qrEWpbsgvZ06rU1GKc28NKvKAxP_gwc-j1IVGA\" \"0s\" '\\x00' '\\x00'}: operation error Route 53: ChangeResourceRecordSets, https response error StatusCode: 400, RequestID: d4277a4b-bef0-423b-bfef-8e68495ea501, InvalidInput: Invalid XML ; javax.xml.stream.XMLStreamException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 248; cvc-complex-type.2.4.b: The content of element 'ResourceRecords' is not complete. One of '{\"https://route53.amazonaws.com/doc/2013-04-01/\":ResourceRecord}' is expected."
}

kdevan avatar Jun 27 '24 19:06 kdevan

I just added us-east-1 as the region value and the error went away and everything works! Just thought I'd mention that this parameter may be required now.

fwiw, the plugin can take the value from the AWS_REGION environment variable.

aymanbagabas avatar Jun 27 '24 19:06 aymanbagabas

@kdevan The exceeded max wait time for ResourceRecordSetsChanged waiter error just means the default wait time, 1 minute, wasn't enough for the records to propagate. You could try and increase the time using max_wait_dur.

aymanbagabas avatar Jul 31 '24 17:07 aymanbagabas

@aymanbagabas Hi! Just to clarify, we should be setting wait_for_propagation to true when working with wildcard certificates right? Thanks :)

RigoOnRails avatar Aug 21 '24 05:08 RigoOnRails

We still get the "exceeded max wait time for ResourceRecordSetsChanged waiter" we have set wait_for_propagation to "true" and set a "max_wait_dur" to 120. Anyone else still having this issue?

batesenergy avatar Sep 09 '24 15:09 batesenergy

The only way to get it working for me with a wildcart certificate was this:

*.mydomain.tld {
  tls {
    dns route53 {
      region "ca-central-1"
      wait_for_propagation true
    }
  }
}

Importantly, setting max_wait_dur to anything other than the default value was not working. And I did need to specify the region... for some reason.

nebez avatar Sep 09 '24 16:09 nebez

For anyone also having trouble. I finally made this work by removing the "wait_for_propagation true" from the caddyfile and it worked right away.

tls { dns route53 { access_key_id "id" secret_access_key "password" region "us-east-1" }

batesenergy avatar Sep 09 '24 18:09 batesenergy

Importantly, setting max_wait_dur to anything other than the default value was not working.

There was a bug with max_wait_dur always using nanoseconds. With v1.5.1, the value for max_wait_dur is always in seconds.

And I did need to specify the region... for some reason.

If region is not specified, it will try to load the region from $AWS_REGION as described in https://aws.github.io/aws-sdk-go-v2/docs/configuring-sdk/#specifying-the-aws-region

EDIT: I've updated the readme to indicate that defining AWS_REGION and aws credentials are required

aymanbagabas avatar Sep 09 '24 20:09 aymanbagabas

Amazing this was unexpected! This new requirement of AWS region totally brought down my whole set of reverse proxies including my cloud when the certs needed to be updated. As soon as I saw that region error I came here.

One question is what region? Does it even matter? Do I use the one I see in the AWS console? https://us-east-1.console.aws.amazon.com/ AFAIK route53 is not related to a region so why the region anyway. So I did set mine to us-east-1. Sure am glad this was just my personal network so being down overnight was not a big issue. Not sure how one could get info on a "breaking" change like this beforehand, but sure would be nice.

Below is working for me now for wildcards. My IAM credentials are environment variables. As others mentioned some times old _acme records don't get cleaned out so I do so via the AWS console. If I feel like I need a clean slate (recreate all the certs) I delete all the caddy settings/certs and restart. At least for arch they can be found at /var/lib/caddy

  tls <redcat>@gmail.com {
    dns route53 {
      max_retries 10
      region "us-east-1"
      wait_for_propagation true
    }
    resolvers 8.8.8.8 1.1.1.1
  }

dkebler avatar Sep 28 '24 18:09 dkebler

We have released a beta version fully compatible with Caddy 2.10 and the new libdns. It includes improved defaults. Give it a try and feel free to file a new issue. We've also added a note about AWS regions in README and it is optional now.

P.S. In some complex cases, multiple retries may be needed to obtain a certificate. Allow approximately 5-7 minutes.

AndrianBdn avatar Sep 30 '25 10:09 AndrianBdn