plugins icon indicating copy to clipboard operation
plugins copied to clipboard

os-acme-client | Cloudflare - domain validation failed (dns01)

Open keithpl opened this issue 3 months ago • 12 comments

Important notices Before you add a new report, we ask you kindly to acknowledge the following:

  • [x] I have read the contributing guide lines at https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md
  • [x] I have searched the existing issues, open and closed, and I'm convinced that mine is new.
  • [x] The title contains the plugin to which this issue belongs

Describe the bug After upgrading to OPNsense 24.1.5_3, the ACME client is no longer able to create TXT records using the Cloudflare DNS-01 challenge type.

To Reproduce Steps to reproduce the behavior:

  1. Go to Services
  2. Click on ACME Client > Certificates
  3. Switch to Certificates
  4. Last ACME Status > validation vailed

Expected behavior validation ok

Relevant log files ACME Log: interestingly, the acme log is empty and all outputs were recorded to the system log.

System Log:

2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="6"] AcmeClient: certificate must be issued/renewed: <redacted_domain>
2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="7"] AcmeClient: issue certificate: <redacted_domain>
2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="8"] AcmeClient: using CA: letsencrypt
2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="9"] AcmeClient: account is registered: <redacted_letsencrypt_account>
2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="10"] AcmeClient: using challenge type: <redacted_dns_challenge_name>
2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="11"] AcmeClient: running acme.sh command: /usr/local/sbin/acme.sh --issue --syslog 7 --debug --server 'letsencrypt' --dns 'dns_cf' --home '/var/etc/acme-client/home' --cert-home '/var/etc/acme-client/cert-home/<redacted_cert_path>' --certpath '/var/etc/acme-client/certs/<redacted_cert_path>/cert.pem' --keypath '/var/etc/acme-client/keys/<redacted_cert_path>/private.key' --capath '/var/etc/acme-client/certs/<redacted_cert_path>/chain.pem' --fullchainpath '/var/etc/acme-client/certs/<redacted_cert_path>/fullchain.pem' --domain '<redacted_domain>' --domain '<redacted_domain>' --days '1'   --keylength '4096' --accountconf '/var/etc/acme-client/accounts/<redacted_account>_prod/account.conf'
2024-04-07T15:28:52-04:00 acme.sh 35127 - [meta sequenceId="12"] [Sun Apr  7 15:28:52 EDT 2024] Add txt record error.
2024-04-07T15:28:52-04:00 acme.sh 37611 - [meta sequenceId="13"] [Sun Apr  7 15:28:52 EDT 2024] Error add txt for domain:_acme-challenge.<redacted_domain>
2024-04-07T15:28:52-04:00 acme.sh 41770 - [meta sequenceId="14"] [Sun Apr  7 15:28:52 EDT 2024] Please add '--debug' or '--log' to check more details.
2024-04-07T15:28:52-04:00 acme.sh 44795 - [meta sequenceId="15"] [Sun Apr  7 15:28:52 EDT 2024] See: https://github.com/acmesh-official/acme.sh/wiki/How-to-debug-acme.sh
2024-04-07T15:28:54-04:00 opnsense 81230 - [meta sequenceId="16"] /usr/local/opnsense/scripts/OPNsense/AcmeClient/lecert.php: AcmeClient: The shell command returned exit code '1': '/usr/local/sbin/acme.sh --issue --syslog 7 --debug --server 'letsencrypt' --dns 'dns_cf' --home '/var/etc/acme-client/home' --cert-home '/var/etc/acme-client/cert-home/<redacted_cert_path>' --certpath '/var/etc/acme-client/certs/<redacted_cert_path>/cert.pem' --keypath '/var/etc/acme-client/keys/<redacted_cert_path>/private.key' --capath '/var/etc/acme-client/certs/<redacted_cert_path>/chain.pem' --fullchainpath '/var/etc/acme-client/certs/<redacted_cert_path>/fullchain.pem' --domain '<redacted_domain>' --domain '<redacted_domain>' --days '1'   --keylength '4096' --accountconf '/var/etc/acme-client/accounts/<redacted_account>_prod/account.conf''
2024-04-07T15:28:54-04:00 opnsense 81230 - [meta sequenceId="17"] AcmeClient: domain validation failed (dns01)
2024-04-07T15:28:54-04:00 opnsense 81230 - [meta sequenceId="18"] AcmeClient: validation for certificate failed: <redacted_domain>

Additional context #3871 details the same problem, having to manually create the TXT record is not a solution as it defeats the point of ACME. The issue is the ACME client is failing to create the TXT record for validation, it seems.

If I run the command directly, I get additional output stating the dns_cf hook cannot be found:

[Sun Apr  7 15:40:51 EDT 2024] Can not find dns api hook for dns_cf

Environment OPNsense 24.1.5_3-amd64 FreeBSD 13.2-RELEASE-p11

keithpl avatar Apr 07 '24 19:04 keithpl

Same issue here ... It just stopped working.

jwaes avatar Apr 11 '24 07:04 jwaes

same issue with OVH

Makss39 avatar Apr 11 '24 17:04 Makss39

Ok ... i figured out how to fix it. So i want to share it with you.

While this worked without in january, now that the time for renewal is here, so something has changed

but the key error in the logs was

2024-04-13T07:31:11 acme.sh [Sat Apr 13 07:31:11 UTC 2024] Invalid status, router.MYDOMAIN.XXX:Verify error detail:DNS problem: SERVFAIL looking up CAA for router.MYDOMAIN.XXX - the domain's nameservers may be malfunctioning

So i read into CAA

https://developers.cloudflare.com/ssl/edge-certificates/caa-records/

and adding this

CAA router 0 issue letsencrypt.org

in cloudflare solved the issue upon the next forced re-issue of my certificate.

As @Makss39 had the issue also with OVH, i guess the change is at the letsencrypt side, where they must now be enforcing the CAA now.

Anyway. Hope it helps for others.

jwaes avatar Apr 13 '24 07:04 jwaes

@jwaes that is not necessary when I use the acme.sh script on a separate machine:

❯ export CF_Zone_ID="<zone_id>"
❯ export CF_Token="<token>"
❯ acme.sh --issue -d <my_domain> --dns dns_cf --server letsencrypt
[Sat Apr 13 10:12:10 AM EDT 2024] Using CA: https://acme-v02.api.letsencrypt.org/directory
[Sat Apr 13 10:12:10 AM EDT 2024] Single domain='<my_domain>'
[Sat Apr 13 10:12:10 AM EDT 2024] Getting domain auth token for each domain
[Sat Apr 13 10:12:11 AM EDT 2024] Getting webroot for domain='<my_domain>'
[Sat Apr 13 10:12:12 AM EDT 2024] Adding txt value: <record> for domain:  _acme-challenge.<my_domain>
[Sat Apr 13 10:12:13 AM EDT 2024] Adding record
[Sat Apr 13 10:12:13 AM EDT 2024] Added, OK
[Sat Apr 13 10:12:13 AM EDT 2024] The txt record is added: Success.
[Sat Apr 13 10:12:13 AM EDT 2024] Let's check each DNS record now. Sleep 20 seconds first.
[Sat Apr 13 10:12:34 AM EDT 2024] You can use '--dnssleep' to disable public dns checks.
[Sat Apr 13 10:12:34 AM EDT 2024] See: https://github.com/acmesh-official/acme.sh/wiki/dnscheck
[Sat Apr 13 10:12:34 AM EDT 2024] Checking <my_domain> for _acme-challenge.<my_domain>
[Sat Apr 13 10:12:35 AM EDT 2024] Domain <my_domain> '_acme-challenge.<my_domain>' success.
[Sat Apr 13 10:12:35 AM EDT 2024] All success, let's return
[Sat Apr 13 10:12:35 AM EDT 2024] Verifying: <my_domain>
[Sat Apr 13 10:12:35 AM EDT 2024] Pending, The CA is processing your order, please just wait. (1/30)
[Sat Apr 13 10:12:39 AM EDT 2024] Success
[Sat Apr 13 10:12:39 AM EDT 2024] Removing DNS records.
[Sat Apr 13 10:12:39 AM EDT 2024] Removing txt: <record> for domain: _acme-challenge.<my_domain>
[Sat Apr 13 10:12:40 AM EDT 2024] Removed: Success
[Sat Apr 13 10:12:40 AM EDT 2024] Verify finished, start to sign.
[Sat Apr 13 10:12:40 AM EDT 2024] Lets finalize the order.
[Sat Apr 13 10:12:40 AM EDT 2024] Le_OrderFinalize='https://acme-v02.api.letsencrypt.org/acme/finalize/<path>'
[Sat Apr 13 10:12:41 AM EDT 2024] Downloading cert.
[Sat Apr 13 10:12:41 AM EDT 2024] Le_LinkCert='https://acme-v02.api.letsencrypt.org/acme/cert/<path>'
[Sat Apr 13 10:12:42 AM EDT 2024] Cert success.
-----BEGIN CERTIFICATE-----
<blah>
-----END CERTIFICATE-----
[Sat Apr 13 10:12:42 AM EDT 2024] Your cert is in: $HOME/.acme.sh/<my_domain>_ecc/<my_domain>.cer
[Sat Apr 13 10:12:42 AM EDT 2024] Your cert key is in: $HOME/.acme.sh/<my_domain>_ecc/<my_domain>.key
[Sat Apr 13 10:12:42 AM EDT 2024] The intermediate CA cert is in: $HOME/.acme.sh/<my_domain>_ecc/ca.cer
[Sat Apr 13 10:12:42 AM EDT 2024] And the full chain certs is there: $HOME/.acme.sh/<my_domain>_ecc/fullchain.cer

keithpl avatar Apr 13 '24 14:04 keithpl

If it's helpful, this is the version of acme.sh that I tested with:

❯ acme.sh --version
https://github.com/acmesh-official/acme.sh
v3.0.7

keithpl avatar Apr 13 '24 14:04 keithpl

Same experience with opnsense 24.1.6 and os-acme-client 4.2.

keithpl avatar Apr 20 '24 13:04 keithpl

Same issue trying to use Cloudflare DNS-01. I get same Can not find dns api hook for dns_cf

OPNsense 24.1.6-amd64 ACME 4.2

EDIT: I tried some debugging; these are the variables acme.sh uses when running the _findHook function in acme.sh to search for the dns_cf.sh file, including the values they were set at when I ran /var/local/sbin/acme.sh:

$_hookdomain = opnsense.********.com
$_hookcat = dnsapi
$_hookname = dns_cf
$_SCRIPT_HOME = /usr/local/sbin
$LE_WORKING_DIR = /var/etc/acme-client/home

If it can't find the file you get the error message Can not find dns api hook for dns_cf. Searches are made using various combinations of sub folder and filenames including $ _hookdomain, $_hookcat, $_hookname, but all assume either $_SCRIPT_HOME or $LE_WORKING_DIR as the base folder.

When I look for dns_cf.sh, it shows they live here:

root@OPNsense:/usr/local/sbin # find / -name "dns_cf*"
/usr/local/share/examples/acme.sh/dnsapi/dns_cf.sh
/root/.acme.sh/dnsapi/dns_cf.sh

So maybe something to do with $_SCRIPT_HOME and $LE_WORKING_DIR not being set properly.

Maybe someone more knowledgeable can help out.

Here's the full _findHook function from https://github.com/acmesh-official/acme.sh/blob/master/acme.sh

_findHook() {
  _hookdomain="$1"
  _hookcat="$2"
  _hookname="$3"

  if [ -f "$_SCRIPT_HOME/$_hookcat/$_hookname" ]; then
    d_api="$_SCRIPT_HOME/$_hookcat/$_hookname"
  elif [ -f "$_SCRIPT_HOME/$_hookcat/$_hookname.sh" ]; then
    d_api="$_SCRIPT_HOME/$_hookcat/$_hookname.sh"
  elif [ "$_hookdomain" ] && [ -f "$LE_WORKING_DIR/$_hookdomain/$_hookname" ]; then
    d_api="$LE_WORKING_DIR/$_hookdomain/$_hookname"
  elif [ "$_hookdomain" ] && [ -f "$LE_WORKING_DIR/$_hookdomain/$_hookname.sh" ]; then
    d_api="$LE_WORKING_DIR/$_hookdomain/$_hookname.sh"
  elif [ -f "$LE_WORKING_DIR/$_hookname" ]; then
    d_api="$LE_WORKING_DIR/$_hookname"
  elif [ -f "$LE_WORKING_DIR/$_hookname.sh" ]; then
    d_api="$LE_WORKING_DIR/$_hookname.sh"
  elif [ -f "$LE_WORKING_DIR/$_hookcat/$_hookname" ]; then
    d_api="$LE_WORKING_DIR/$_hookcat/$_hookname"
  elif [ -f "$LE_WORKING_DIR/$_hookcat/$_hookname.sh" ]; then
    d_api="$LE_WORKING_DIR/$_hookcat/$_hookname.sh"
  fi

  printf "%s" "$d_api"
}

mkerost avatar Apr 22 '24 19:04 mkerost

HACKY FIX. So based on my previous post, I did the following work around and symbolically linked to the dnsapi folder from LE working directory:

ln -s /root/.acme.sh/dnsapi /var/etc/acme-client/home

I then ran a cert update and this fixed the problem. Cert successfully issued!

BUG: Through this whole process, I noticed that setting ACME log to debug doesn't work properly. There is important info that doesn't make it into syslog, specifically the exact error message from cloudflare if verification fails. This seems to down to opnsense not passing the right --syslog number when I set logging to "debug 3". The log shows that opnsense passed --syslog 7 but 7 is only debug level 1. it should be --syslog 9 for debug 3, --syslog 8 for debug 2, and --syslog 7 for debug 1. I will post this as a separate issue.

mkerost avatar Apr 22 '24 23:04 mkerost