lua-resty-auto-ssl
lua-resty-auto-ssl copied to clipboard
JWS Verification Error in dehydrated
Hey!
2 days ago I had to bump the version of this module to latest due to the ACME v1 deprecation. After bumping this module, some certificates could successfully renew in isolation, but certificates that are receiving sustained amounts of traffic appeared to be making the server crash during generation, which was causing us to hit Lets Encrypt's rate limits as the process was incomplete.
Looking in the logs, I was getting very similar log output that I mentioned here:
https://github.com/GUI/lua-resty-auto-ssl/issues/136
notice signal 15 (SIGTERM) received, exiting
notice signal 15 (SIGTERM) received, exiting
notice signal 14 (SIGALRM) received
notice signal 14 (SIGALRM) received
notice signal 14 (SIGALRM) received
notice signal 14 (SIGALRM) received
notice signal 14 (SIGALRM) received
notice signal 14 (SIGALRM) received
notice signal 14 (SIGALRM) received
notice signal 14 (SIGALRM) received
notice signal 14 (SIGALRM) received
notice signal 14 (SIGALRM) received
notice signal 14 (SIGALRM) received
notice signal 14 (SIGALRM) received
notice signal 17 (SIGCHLD) received
notice signal 17 (SIGCHLD) received
alert worker process 19879 exited on signal 9
alert worker process 19879 exited on signal 9
alert worker process 19879 exited on signal 9
alert shared memory zone "auto_ssl" was locked by 19879
alert shared memory zone "auto_ssl" was locked by 19879
alert shared memory zone "auto_ssl" was locked by 19879
notice exit
notice exit
INFO openresty (exit status 0)
INFO 'openresty' with pid 10560
notice using the "epoll" event method
notice using the "epoll" event method
notice openresty/1.11.2.2
notice openresty/1.11.2.2
notice built by gcc 4.9.2 (Debian 4.9.2-10)
notice built by gcc 4.9.2 (Debian 4.9.2-10)
notice OS: Linux 4.4.0-1092-aws
notice OS: Linux 4.4.0-1092-aws
notice getrlimit(RLIMIT_NOFILE): 1048576:1048576
notice getrlimit(RLIMIT_NOFILE): 1048576:1048576
notice start worker processes
notice start worker processes
notice start worker process 10588
notice start worker process 10588
notice signal 17 (SIGCHLD) received
notice signal 17 (SIGCHLD) received
info waitpid() failed (10: No child processes)
notice signal 17 (SIGCHLD) received
notice signal 17 (SIGCHLD) received
info waitpid() failed (10: No child processes)
notice [lua] start_sockproc.lua:11: auto-ssl: starting sockproc, context: init_worker_by_lua*
notice [lua] start_sockproc.lua:11: auto-ssl: starting sockproc, context: init_worker_by_lua*
CRIT reaped unknown pid 19899)
notice signal 17 (SIGCHLD) received
notice signal 17 (SIGCHLD) received
notice signal 17 (SIGCHLD) received
notice signal 17 (SIGCHLD) received
info waitpid() failed (10: No child processes)
I came across this investigation here (https://github.com/GUI/lua-resty-auto-ssl/issues/43#issuecomment-539993756) and decided to bump openresty to openresty/openresty:1.15.8.2-5-xenial
to try and fix the lock issues in lua-resty-core
.
The instances that we have running are more stable now, but we keep seeing this error when generating new certs:
error [lua] lets_encrypt.lua:40: issue_cert(): auto-ssl: dehydrated failed: env HOOK_SECRET=xxx HOOK_SERVER_PORT=8999 /usr/local/openresty/luajit/bin/resty-auto-ssl/dehydrated --cron --accept-terms --no-lock --domain xxxx.example.com --challenge http-01 --config /etc/resty-auto-ssl/letsencrypt/config --hook /usr/local/openresty/luajit/bin/resty-auto-ssl/letsencrypt_hooks status: 256 out: # INFO: Using main config file /etc/resty-auto-ssl/letsencrypt/config
ERROR # INFO: Using additional config file /etc/resty-auto-ssl/letsencrypt/conf.d/dehydrated-conf.sh
ERROR startup_hook
ERROR Processing xxxx.example.com
ERROR + Signing domains...
ERROR + Generating private key...
ERROR + Generating signing request...
ERROR + Requesting new certificate order from CA...
ERROR err: + ERROR: An error occurred while sending post-request to https://acme-v02.api.letsencrypt.org/acme/new-order (Status 400)
ERROR
ERROR Details:
ERROR HTTP/1.1 100 Continue
ERROR
ERROR HTTP/1.1 400 Bad Request
ERROR Server: nginx
ERROR Date: Wed, 27 Nov 2019 23:00:58 GMT
ERROR Content-Type: application/problem+json
ERROR Content-Length: 107
ERROR Connection: keep-alive
ERROR Boulder-Requester: 72675174
ERROR Cache-Control: public, max-age=0, no-cache
ERROR Link: <https://acme-v02.api.letsencrypt.org/directory>;rel="index"
ERROR Replay-Nonce: xxxx
ERROR
ERROR {
ERROR "type": "urn:ietf:params:acme:error:malformed",
ERROR "detail": "JWS verification error",
ERROR "status": 400
ERROR }
Have you seen this before? Is there a misconfiguration somewhere in our nginx config, or in dehydrated? I'm not sure if this is related to the locking issue or not, my guess would be no.
Thanks!
I don't think I've seen this before. Based on where that's failing, that seems mostly to be happening on dehydrated's side with the request it's making to issue the cert. Based on some similar issues (https://github.com/go-acme/lego/issues/1006#issuecomment-557026527), I'm wondering if somehow the local account registration needs to be regenerated? You could trigger a new account to be registered by deleting/moving the /etc/resty-auto-ssl/letsencrypt/accounts
directory. Then the next issue attempt should create a new account first.
If that doesn't do the trick, then we can debug further by trying to figure out what the JWS payload dehydrated is sending to Let's Encrypt.
I think I'm seeing the same error when renewing certs (haven't had to issue new ones recently). I had thought this error was related to upgrading (#192), but I already updated to 0.13.1 using dehy 0.6.5.:
2019/12/13 08:35:09 [error] 24103#24103: *315594 [lua] lets_encrypt.lua:40: issue_cert(): auto-ssl: dehydrated failed: env HOOK_SECRET=78a061f15ea4f4cd0ec14550328699a25e37076e5f5022c4510db0f578a4c943 HOOK_SERVER_PORT=8999 /usr/local/openresty/luajit/bin/resty-auto-ssl/dehydrated --cron --accept-terms --no-lock --domain www.example.com --challenge http-01 --config /etc/resty-auto-ssl/letsencrypt/config --hook /usr/local/openresty/luajit/bin/resty-auto-ssl/letsencrypt_hooks status: 256 out: # INFO: Using main config file /etc/resty-auto-ssl/letsencrypt/config
# INFO: Using additional config file /etc/resty-auto-ssl/letsencrypt/conf.d/custom.sh
startup_hook
Processing www.example.com
+ Checking domain name(s) of existing cert... unchanged.
+ Checking expire date of existing cert...
+ Valid till Jan 11 07:18:37 2020 GMT (Less than 30 days). Renewing!
+ Signing domains...
+ Generating private key...
+ Generating signing request...
+ Requesting new certificate order from CA...
+ Received 1 authorizations URLs from the CA
+ Handling authorization for www.example.com
+ 1 pending challenge(s)
+ Deploying challenge tokens...
deploy_challenge
+ Responding to challenge for www.example.com authorization...
+ Challenge is valid!
+ Cleaning challenge tokens...
clean_challenge
+ Requesting certificate...
err: + ERROR: An error occurred while sending head-request to https://acme-v02.api.letsencrypt.org/acme/new-nonce (Status 000)
Details:
+ ERROR: An error occurred while sending post-request to https://acme-v02.api.letsencrypt.org/acme/cert/046f082998c60c4ec958b81231352b2e920c (Status 400)
Details:
HTTP/1.1 400 Bad Request
Server: nginx
Date: Fri, 13 Dec 2019 08:35:09 GMT
Content-Type: application/problem+json
Content-Length: 112
Connection: keep-alive
Cache-Control: public, max-age=0, no-cache
Link: <https://acme-v02.api.letsencrypt.org/directory>;rel="index"
Replay-Nonce: 0001fgA4Y_cX4LlSjF3J4aj0CUOLQt4skPEGNSqr9Ptb4_g
{
"type": "urn:ietf:params:acme:error:badNonce",
"detail": "JWS has no anti-replay nonce",
"status": 400
}
, context: ngx.timer
2019/12/13 08:35:09 [error] 24103#24103: *315594 [lua] renewal.lua:176: renew_check_cert(): auto-ssl: issuing renewal certificate failed: dehydrated failure, context: ngx.timer
I tried moving the account registration as suggested by @GUI above, then ran a renewal. A new registration key was created, but instead of trying to renew the certificate, it was immediately terminated:
2019/12/13 22:49:25 [notice] 24154#24154: *7910 [lua] renewal.lua:73: renew_check_cert(): auto-ssl: checking certificate renewals for www.example.com, context: ngx.timer
2019/12/13 22:49:25 [notice] 24154#24154: signal 17 (SIGCHLD) received from 26319
On another note, does anyone know how can I obtain the HOOK_SECRET key so I can try running the check on a specific domain by command line? I used to pull it from my error logs, but I can only get it if it actually runs...and it changes. TIA.
Could this issue be related to this one in Dehydrated?
https://github.com/lukas2511/dehydrated/issues/684
Hmm, well this is strange...
I ran the renewal again. This time all of the domains that were failing before were skipped(!), due to being more than 30 days from expiration?! I verified that the certs had in fact been renewed successfully.
Somehow they renewed, but without the normally verbose log output.
I added a new domain and the certificate was issued and the website loaded with the new certificate. But again the logs only show:
2019/12/14 00:14:13 [notice] 1318#1318: *691 [lua] ssl_certificate.lua:94: issue_cert(): auto-ssl: issuing new certificate for www.example1.com, context: ssl_certificate_by_lua*, client: 67.188.248.162, server: 0.0.0.0:443
2019/12/14 00:14:20 [notice] 1318#1318: signal 17 (SIGCHLD) received from 2351
Update: Changed my logging level to Info:
2019/12/14 00:43:14 [debug] 6852#6852: *163 [lua] lets_encrypt.lua:44: issue_cert(): auto-ssl: dehydrated output: # INFO: Using main config file /etc/resty-auto-ssl/letsencrypt/config
# INFO: Using additional config file /etc/resty-auto-ssl/letsencrypt/conf.d/custom.sh
startup_hook
Processing www.example2.com
+ Creating new directory /etc/resty-auto-ssl/letsencrypt/certs/www.example2.com ...
+ Signing domains...
+ Generating private key...
+ Generating signing request...
+ Requesting new certificate order from CA...
+ Received 1 authorizations URLs from the CA
+ Handling authorization for www.example2.com
+ 1 pending challenge(s)
+ Deploying challenge tokens...
deploy_challenge
+ Responding to challenge for www.example2.com authorization...
+ Challenge is valid!
+ Cleaning challenge tokens...
clean_challenge
+ Requesting certificate...
+ Checking certificate...
+ Done!
+ Creating fullchain.pem...
deploy_cert
+ Done!
exit_hook
I guess it's working now? ¯_(ツ)_/¯ Perhaps regenerating the account keys was the trick. Thanks @GUI!
Just in case if anyone run into this issue. This could also be caused by Cloudflare. "Always redirect to SSL" option. It cut off HTTP traffic to acme challage.
I'm seeing this problem on a new server where I'm trying to set up HTTPS. Running version v0.7.1 with dehydrated -f /etc/ssl/letsencrypt/dehydrated.conf --register --accept-terms
:
# INFO: Using main config file /etc/ssl/letsencrypt/dehydrated.conf
+ Generating account key...
+ Registering account key with ACME server...
+ ERROR: An error occurred while sending post-request to https://acme-v02.api.letsencrypt.org/acme/new-acct (Status 400)
Details:
HTTP/2 400
server: nginx
date: Fri, 13 Jan 2023 11:08:42 GMT
content-type: application/problem+json
content-length: 112
cache-control: public, max-age=0, no-cache
link: <https://acme-v02.api.letsencrypt.org/directory>;rel="index"
replay-nonce: F977yPEwYkTmLVp3EfhyyxGkz6AS8gxj0ZGO8La6ije9b3U
{
"type": "urn:ietf:params:acme:error:badNonce",
"detail": "JWS has no anti-replay nonce",
"status": 400
}
Error registering account key. See message above for more information.
Any idea how to debug this?
Seeing the same as @rixx above