lua-resty-auto-ssl icon indicating copy to clipboard operation
lua-resty-auto-ssl copied to clipboard

JWS Verification Error in dehydrated

Open domharrington opened this issue 5 years ago • 8 comments

Hey!

2 days ago I had to bump the version of this module to latest due to the ACME v1 deprecation. After bumping this module, some certificates could successfully renew in isolation, but certificates that are receiving sustained amounts of traffic appeared to be making the server crash during generation, which was causing us to hit Lets Encrypt's rate limits as the process was incomplete.

Looking in the logs, I was getting very similar log output that I mentioned here:

https://github.com/GUI/lua-resty-auto-ssl/issues/136

notice signal 15 (SIGTERM) received, exiting 
notice signal 15 (SIGTERM) received, exiting 
notice signal 14 (SIGALRM) received 
notice signal 14 (SIGALRM) received 
notice signal 14 (SIGALRM) received 
notice signal 14 (SIGALRM) received 
notice signal 14 (SIGALRM) received 
notice signal 14 (SIGALRM) received 
notice signal 14 (SIGALRM) received 
notice signal 14 (SIGALRM) received 
notice signal 14 (SIGALRM) received 
notice signal 14 (SIGALRM) received 
notice signal 14 (SIGALRM) received 
notice signal 14 (SIGALRM) received 
notice signal 17 (SIGCHLD) received 
notice signal 17 (SIGCHLD) received 
alert worker process 19879 exited on signal 9 
alert worker process 19879 exited on signal 9 
alert worker process 19879 exited on signal 9 
alert shared memory zone "auto_ssl" was locked by 19879 
alert shared memory zone "auto_ssl" was locked by 19879 
alert shared memory zone "auto_ssl" was locked by 19879 
notice exit 
notice exit 
INFO openresty (exit status 0) 
INFO 'openresty' with pid 10560 
notice using the "epoll" event method 
notice using the "epoll" event method 
notice openresty/1.11.2.2 
notice openresty/1.11.2.2 
notice built by gcc 4.9.2 (Debian 4.9.2-10)  
notice built by gcc 4.9.2 (Debian 4.9.2-10)  
notice OS: Linux 4.4.0-1092-aws 
notice OS: Linux 4.4.0-1092-aws 
notice getrlimit(RLIMIT_NOFILE): 1048576:1048576 
notice getrlimit(RLIMIT_NOFILE): 1048576:1048576 
notice start worker processes 
notice start worker processes 
notice start worker process 10588 
notice start worker process 10588 
notice signal 17 (SIGCHLD) received 
notice signal 17 (SIGCHLD) received 
info waitpid() failed (10: No child processes) 
notice signal 17 (SIGCHLD) received 
notice signal 17 (SIGCHLD) received 
info waitpid() failed (10: No child processes) 
notice [lua] start_sockproc.lua:11: auto-ssl: starting sockproc, context: init_worker_by_lua* 
notice [lua] start_sockproc.lua:11: auto-ssl: starting sockproc, context: init_worker_by_lua* 
CRIT reaped unknown pid 19899) 
notice signal 17 (SIGCHLD) received 
notice signal 17 (SIGCHLD) received 
notice signal 17 (SIGCHLD) received 
notice signal 17 (SIGCHLD) received 
info waitpid() failed (10: No child processes)  

I came across this investigation here (https://github.com/GUI/lua-resty-auto-ssl/issues/43#issuecomment-539993756) and decided to bump openresty to openresty/openresty:1.15.8.2-5-xenial to try and fix the lock issues in lua-resty-core.

The instances that we have running are more stable now, but we keep seeing this error when generating new certs:

error [lua] lets_encrypt.lua:40: issue_cert(): auto-ssl: dehydrated failed: env HOOK_SECRET=xxx HOOK_SERVER_PORT=8999 /usr/local/openresty/luajit/bin/resty-auto-ssl/dehydrated --cron --accept-terms --no-lock --domain xxxx.example.com --challenge http-01 --config /etc/resty-auto-ssl/letsencrypt/config --hook /usr/local/openresty/luajit/bin/resty-auto-ssl/letsencrypt_hooks status: 256 out: # INFO: Using main config file /etc/resty-auto-ssl/letsencrypt/config 
ERROR # INFO: Using additional config file /etc/resty-auto-ssl/letsencrypt/conf.d/dehydrated-conf.sh 
ERROR startup_hook 
ERROR Processing xxxx.example.com 
ERROR  + Signing domains... 
ERROR  + Generating private key... 
ERROR  + Generating signing request... 
ERROR  + Requesting new certificate order from CA... 
ERROR  err:   + ERROR: An error occurred while sending post-request to https://acme-v02.api.letsencrypt.org/acme/new-order (Status 400) 
ERROR  
ERROR Details: 
ERROR HTTP/1.1 100 Continue 
ERROR  
ERROR HTTP/1.1 400 Bad Request 
ERROR Server: nginx 
ERROR Date: Wed, 27 Nov 2019 23:00:58 GMT 
ERROR Content-Type: application/problem+json 
ERROR Content-Length: 107 
ERROR Connection: keep-alive 
ERROR Boulder-Requester: 72675174 
ERROR Cache-Control: public, max-age=0, no-cache 
ERROR Link: <https://acme-v02.api.letsencrypt.org/directory>;rel="index" 
ERROR Replay-Nonce: xxxx 
ERROR  
ERROR { 
ERROR   "type": "urn:ietf:params:acme:error:malformed", 
ERROR   "detail": "JWS verification error", 
ERROR   "status": 400 
ERROR } 

Have you seen this before? Is there a misconfiguration somewhere in our nginx config, or in dehydrated? I'm not sure if this is related to the locking issue or not, my guess would be no.

Thanks!

domharrington avatar Nov 27 '19 23:11 domharrington

I don't think I've seen this before. Based on where that's failing, that seems mostly to be happening on dehydrated's side with the request it's making to issue the cert. Based on some similar issues (https://github.com/go-acme/lego/issues/1006#issuecomment-557026527), I'm wondering if somehow the local account registration needs to be regenerated? You could trigger a new account to be registered by deleting/moving the /etc/resty-auto-ssl/letsencrypt/accounts directory. Then the next issue attempt should create a new account first.

If that doesn't do the trick, then we can debug further by trying to figure out what the JWS payload dehydrated is sending to Let's Encrypt.

GUI avatar Nov 28 '19 19:11 GUI

I think I'm seeing the same error when renewing certs (haven't had to issue new ones recently). I had thought this error was related to upgrading (#192), but I already updated to 0.13.1 using dehy 0.6.5.:

2019/12/13 08:35:09 [error] 24103#24103: *315594 [lua] lets_encrypt.lua:40: issue_cert(): auto-ssl: dehydrated failed: env HOOK_SECRET=78a061f15ea4f4cd0ec14550328699a25e37076e5f5022c4510db0f578a4c943 HOOK_SERVER_PORT=8999 /usr/local/openresty/luajit/bin/resty-auto-ssl/dehydrated --cron --accept-terms --no-lock --domain www.example.com --challenge http-01 --config /etc/resty-auto-ssl/letsencrypt/config --hook /usr/local/openresty/luajit/bin/resty-auto-ssl/letsencrypt_hooks status: 256 out: # INFO: Using main config file /etc/resty-auto-ssl/letsencrypt/config
# INFO: Using additional config file /etc/resty-auto-ssl/letsencrypt/conf.d/custom.sh
startup_hook
Processing www.example.com
 + Checking domain name(s) of existing cert... unchanged.
 + Checking expire date of existing cert...
 + Valid till Jan 11 07:18:37 2020 GMT (Less than 30 days). Renewing!
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting new certificate order from CA...
 + Received 1 authorizations URLs from the CA
 + Handling authorization for www.example.com
 + 1 pending challenge(s)
 + Deploying challenge tokens...
deploy_challenge
 + Responding to challenge for www.example.com authorization...
 + Challenge is valid!
 + Cleaning challenge tokens...
clean_challenge
 + Requesting certificate...
 err:   + ERROR: An error occurred while sending head-request to https://acme-v02.api.letsencrypt.org/acme/new-nonce (Status 000)

Details:

  + ERROR: An error occurred while sending post-request to https://acme-v02.api.letsencrypt.org/acme/cert/046f082998c60c4ec958b81231352b2e920c (Status 400)

Details:
HTTP/1.1 400 Bad Request
Server: nginx
Date: Fri, 13 Dec 2019 08:35:09 GMT
Content-Type: application/problem+json
Content-Length: 112
Connection: keep-alive
Cache-Control: public, max-age=0, no-cache
Link: <https://acme-v02.api.letsencrypt.org/directory>;rel="index"
Replay-Nonce: 0001fgA4Y_cX4LlSjF3J4aj0CUOLQt4skPEGNSqr9Ptb4_g

{
  "type": "urn:ietf:params:acme:error:badNonce",
  "detail": "JWS has no anti-replay nonce",
  "status": 400
}

, context: ngx.timer
2019/12/13 08:35:09 [error] 24103#24103: *315594 [lua] renewal.lua:176: renew_check_cert(): auto-ssl: issuing renewal certificate failed: dehydrated failure, context: ngx.timer

bryanus avatar Dec 13 '19 19:12 bryanus

I tried moving the account registration as suggested by @GUI above, then ran a renewal. A new registration key was created, but instead of trying to renew the certificate, it was immediately terminated:

2019/12/13 22:49:25 [notice] 24154#24154: *7910 [lua] renewal.lua:73: renew_check_cert(): auto-ssl: checking certificate renewals for www.example.com, context: ngx.timer
2019/12/13 22:49:25 [notice] 24154#24154: signal 17 (SIGCHLD) received from 26319

On another note, does anyone know how can I obtain the HOOK_SECRET key so I can try running the check on a specific domain by command line? I used to pull it from my error logs, but I can only get it if it actually runs...and it changes. TIA.

bryanus avatar Dec 13 '19 23:12 bryanus

Could this issue be related to this one in Dehydrated?

https://github.com/lukas2511/dehydrated/issues/684

bryanus avatar Dec 13 '19 23:12 bryanus

Hmm, well this is strange...

I ran the renewal again. This time all of the domains that were failing before were skipped(!), due to being more than 30 days from expiration?! I verified that the certs had in fact been renewed successfully.

Somehow they renewed, but without the normally verbose log output.

I added a new domain and the certificate was issued and the website loaded with the new certificate. But again the logs only show:

2019/12/14 00:14:13 [notice] 1318#1318: *691 [lua] ssl_certificate.lua:94: issue_cert(): auto-ssl: issuing new certificate for www.example1.com, context: ssl_certificate_by_lua*, client: 67.188.248.162, server: 0.0.0.0:443
2019/12/14 00:14:20 [notice] 1318#1318: signal 17 (SIGCHLD) received from 2351

Update: Changed my logging level to Info:

2019/12/14 00:43:14 [debug] 6852#6852: *163 [lua] lets_encrypt.lua:44: issue_cert(): auto-ssl: dehydrated output: # INFO: Using main config file /etc/resty-auto-ssl/letsencrypt/config
# INFO: Using additional config file /etc/resty-auto-ssl/letsencrypt/conf.d/custom.sh
startup_hook
Processing www.example2.com
 + Creating new directory /etc/resty-auto-ssl/letsencrypt/certs/www.example2.com ...
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting new certificate order from CA...
 + Received 1 authorizations URLs from the CA
 + Handling authorization for www.example2.com
 + 1 pending challenge(s)
 + Deploying challenge tokens...
deploy_challenge
 + Responding to challenge for www.example2.com authorization...
 + Challenge is valid!
 + Cleaning challenge tokens...
clean_challenge
 + Requesting certificate...
 + Checking certificate...
 + Done!
 + Creating fullchain.pem...
deploy_cert
 + Done!
exit_hook

I guess it's working now? ¯_(ツ)_/¯ Perhaps regenerating the account keys was the trick. Thanks @GUI!

bryanus avatar Dec 14 '19 00:12 bryanus

Just in case if anyone run into this issue. This could also be caused by Cloudflare. "Always redirect to SSL" option. It cut off HTTP traffic to acme challage.

Adel-Magebinary avatar Oct 04 '21 07:10 Adel-Magebinary

I'm seeing this problem on a new server where I'm trying to set up HTTPS. Running version v0.7.1 with dehydrated -f /etc/ssl/letsencrypt/dehydrated.conf --register --accept-terms:

# INFO: Using main config file /etc/ssl/letsencrypt/dehydrated.conf
+ Generating account key...
+ Registering account key with ACME server...
  + ERROR: An error occurred while sending post-request to https://acme-v02.api.letsencrypt.org/acme/new-acct (Status 400)

Details:
HTTP/2 400 
server: nginx
date: Fri, 13 Jan 2023 11:08:42 GMT
content-type: application/problem+json
content-length: 112
cache-control: public, max-age=0, no-cache
link: <https://acme-v02.api.letsencrypt.org/directory>;rel="index"
replay-nonce: F977yPEwYkTmLVp3EfhyyxGkz6AS8gxj0ZGO8La6ije9b3U

{
  "type": "urn:ietf:params:acme:error:badNonce",
  "detail": "JWS has no anti-replay nonce",
  "status": 400
}



Error registering account key. See message above for more information.

Any idea how to debug this?

rixx avatar Jan 13 '23 11:01 rixx

Seeing the same as @rixx above

byrnedo avatar Apr 04 '24 07:04 byrnedo