kube-lego icon indicating copy to clipboard operation
kube-lego copied to clipboard

404 in reachability test with GCE ingress

Open renaudguerin opened this issue 7 years ago • 8 comments

Hi,

I've been using kube-lego for a few months with one GCE ingress and 2 nginx ones, and everything has been fine.

But the cert on our GCE ingress now has <30 days left and renewal has been failing :

2016-12-15T13:15:51.663099634Z time="2016-12-15T13:15:51Z" level=info msg="cert expires soon so renew" context="ingress_tls" expire_time=2017-01-03 14:39:00 +0000 UTC name=api-gce-ingress namespace=production 
2016-12-15T13:15:51.663188834Z time="2016-12-15T13:15:51Z" level=info msg="requesting certificate for api.XXXXX.com" context="ingress_tls" name=api-gce-ingress namespace=production 
2016-12-15T13:16:55.446797209Z time="2016-12-15T13:16:55Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=api.XXXXX.com 

Ingress configuration looks correct, kube-lego has claimed the challenge path :

Host			Path	Backends
  ----			----	--------
  api.XXXXX.com
    			/.well-known/acme-challenge/* 	kube-lego-gce:8080 (<none>)
    			/* 				XXXXX-api:3000 (<none>)

And indeed it's kube-lego responding there :

$ curl https://api.XXXXX.com/.well-known/acme-challenge/
kube-lego (version 0.1.3-d425b293) - 404 not found

At least one path is returning a 200 :

$ curl https://api.XXXXXX.com/.well-known/acme-challenge/_selftest
XXXXXXXXXXXXXXX

I've tried restarting kube-lego, no luck. Any ideas ?

renaudguerin avatar Dec 15 '16 13:12 renaudguerin

I encountered the same problem using nginx ingress controller behind ELB and without. Trying to receive any new certificate.

Maybe the ACME library kube-lego uses is outdated, or something.

sysradium avatar Dec 16 '16 12:12 sysradium

Any ideas ?

renaudguerin avatar Dec 23 '16 21:12 renaudguerin

No idea 😞 Problem disappeared by itself and isn't back yet. So I have no logs to dig ...

sysradium avatar Dec 23 '16 23:12 sysradium

Still having the issue. Since I'm <1 week away from the cert expiring, I'd love to try & squash the bug myself if @simonswine isn't available, any pointers for where to look first would be appreciated though.

renaudguerin avatar Dec 28 '16 00:12 renaudguerin

Hey @renaudguerin,

I have some time this week, so if you could provide me with:

  • Screenshot from GCE Console - Networking - Load Balancers
  • Raise log level to debug while cert is failing

Cheers, Christian

simonswine avatar Dec 28 '16 08:12 simonswine

Thanks @simonswine

Here's the full debug log (last 2 lines repeated many times as kube-lego retries the reachability test), and real domain name edited for confidentiality.

2016-12-28T20:56:50.847151410Z time="2016-12-28T20:56:50Z" level=info msg="kube-lego 0.1.3-d425b293 starting" context=kubelego 
2016-12-28T20:56:50.876586125Z time="2016-12-28T20:56:50Z" level=info msg="connected to kubernetes api v1.4.7" context=kubelego 
2016-12-28T20:56:50.876746854Z time="2016-12-28T20:56:50Z" level=debug msg="start watching ingress objects" context=kubelego 
2016-12-28T20:56:50.877342626Z time="2016-12-28T20:56:50Z" level=info msg="server listening on http://:8080/" context=acme 
2016-12-28T20:56:50.879721847Z time="2016-12-28T20:56:50Z" level=debug msg="CREATE ingress/production/api-gce-ingress" context=kubelego 
2016-12-28T20:56:50.879823727Z time="2016-12-28T20:56:50Z" level=debug msg="CREATE ingress/production/static-ingress" context=kubelego 
2016-12-28T20:56:50.879895337Z time="2016-12-28T20:56:50Z" level=debug msg="CREATE ingress/production/ws-ingress" context=kubelego 
2016-12-28T20:56:50.879981935Z time="2016-12-28T20:56:50Z" level=debug msg="worker: begin processing true" context=kubelego 
2016-12-28T20:56:50.890369299Z time="2016-12-28T20:56:50Z" level=info msg="ignoring as has no annotiation 'kubernetes.io/tls-acme'" context=ingress name=kube-lego-nginx namespace=kube-lego 
2016-12-28T20:56:50.890475469Z time="2016-12-28T20:56:50Z" level=debug msg=reset context=provider provider=gce 
2016-12-28T20:56:50.893689251Z time="2016-12-28T20:56:50Z" level=debug msg=finialize context=provider provider=gce 
2016-12-28T20:56:50.895528039Z time="2016-12-28T20:56:50Z" level=debug msg="setting up svc endpoint" context=provider namespace=production pod_ip=10.16.0.176 provider=gce 
2016-12-28T20:56:50.936958122Z time="2016-12-28T20:56:50Z" level=debug msg=reset context=provider provider=nginx 
2016-12-28T20:56:50.937079376Z time="2016-12-28T20:56:50Z" level=debug msg=finialize context=provider provider=nginx 
2016-12-28T20:56:50.949346702Z time="2016-12-28T20:56:50Z" level=info msg="process certificates requests for ingresses" context=kubelego 
2016-12-28T20:56:50.952419219Z time="2016-12-28T20:56:50Z" level=info msg="cert expires in 71.8 days, no renewal needed" context="ingress_tls" expire_time=2017-03-10 15:48:00 +0000 UTC name=ws-ingress namespace=production 
2016-12-28T20:56:50.952523622Z time="2016-12-28T20:56:50Z" level=info msg="no cert request needed" context="ingress_tls" name=ws-ingress namespace=production 
2016-12-28T20:56:50.955513389Z time="2016-12-28T20:56:50Z" level=info msg="cert expires soon so renew" context="ingress_tls" expire_time=2017-01-03 14:39:00 +0000 UTC name=api-gce-ingress namespace=production 
2016-12-28T20:56:50.955619778Z time="2016-12-28T20:56:50Z" level=info msg="requesting certificate for api.XXXXX.com" context="ingress_tls" name=api-gce-ingress namespace=production 
2016-12-28T20:56:51.712482350Z time="2016-12-28T20:56:51Z" level=debug msg="testing reachablity of http://api.XXXXX.com/.well-known/acme-challenge/_selftest" context=acme domain=api.XXXXX.com

And I think the GCE LB config is fine, as you can see in my first message it's definitely kube-lego serving the acme-challenge path

screenshot

renaudguerin avatar Dec 28 '16 21:12 renaudguerin

Wait a second ... I think I've found the issue. Just noticed in the debug output that the reachability test is done over http (which I guess makes sense, unless it could be done over https with a temporary dummy certificate somehow ?). And I had only enabled https on this load balancer... GCE does return a 404 for the http acme URL.

Case closed then, although it would be nice not to have to keep http open just for kube-lego (unless the ACME protocol makes it unavoidable)

renaudguerin avatar Dec 28 '16 21:12 renaudguerin

That's weird. I'm having a similar issue except when I manually check the challenge path I get a good response. Any more ideas here?

UPDATE: I deleted the kube-lego pod and when the new pod came up it immediately issued a cert for my domain. Maybe some DNS cache that needed to clear??

ericuldall avatar Aug 24 '17 22:08 ericuldall