kube-lego
kube-lego copied to clipboard
404 in reachability test with GCE ingress
Hi,
I've been using kube-lego for a few months with one GCE ingress and 2 nginx ones, and everything has been fine.
But the cert on our GCE ingress now has <30 days left and renewal has been failing :
2016-12-15T13:15:51.663099634Z time="2016-12-15T13:15:51Z" level=info msg="cert expires soon so renew" context="ingress_tls" expire_time=2017-01-03 14:39:00 +0000 UTC name=api-gce-ingress namespace=production
2016-12-15T13:15:51.663188834Z time="2016-12-15T13:15:51Z" level=info msg="requesting certificate for api.XXXXX.com" context="ingress_tls" name=api-gce-ingress namespace=production
2016-12-15T13:16:55.446797209Z time="2016-12-15T13:16:55Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=api.XXXXX.com
Ingress configuration looks correct, kube-lego has claimed the challenge path :
Host Path Backends
---- ---- --------
api.XXXXX.com
/.well-known/acme-challenge/* kube-lego-gce:8080 (<none>)
/* XXXXX-api:3000 (<none>)
And indeed it's kube-lego responding there :
$ curl https://api.XXXXX.com/.well-known/acme-challenge/
kube-lego (version 0.1.3-d425b293) - 404 not found
At least one path is returning a 200 :
$ curl https://api.XXXXXX.com/.well-known/acme-challenge/_selftest
XXXXXXXXXXXXXXX
I've tried restarting kube-lego, no luck. Any ideas ?
I encountered the same problem using nginx ingress controller behind ELB and without. Trying to receive any new certificate.
Maybe the ACME library kube-lego uses is outdated, or something.
Any ideas ?
No idea 😞 Problem disappeared by itself and isn't back yet. So I have no logs to dig ...
Still having the issue. Since I'm <1 week away from the cert expiring, I'd love to try & squash the bug myself if @simonswine isn't available, any pointers for where to look first would be appreciated though.
Hey @renaudguerin,
I have some time this week, so if you could provide me with:
- Screenshot from GCE Console - Networking - Load Balancers
- Raise log level to debug while cert is failing
Cheers, Christian
Thanks @simonswine
Here's the full debug log (last 2 lines repeated many times as kube-lego retries the reachability test), and real domain name edited for confidentiality.
2016-12-28T20:56:50.847151410Z time="2016-12-28T20:56:50Z" level=info msg="kube-lego 0.1.3-d425b293 starting" context=kubelego
2016-12-28T20:56:50.876586125Z time="2016-12-28T20:56:50Z" level=info msg="connected to kubernetes api v1.4.7" context=kubelego
2016-12-28T20:56:50.876746854Z time="2016-12-28T20:56:50Z" level=debug msg="start watching ingress objects" context=kubelego
2016-12-28T20:56:50.877342626Z time="2016-12-28T20:56:50Z" level=info msg="server listening on http://:8080/" context=acme
2016-12-28T20:56:50.879721847Z time="2016-12-28T20:56:50Z" level=debug msg="CREATE ingress/production/api-gce-ingress" context=kubelego
2016-12-28T20:56:50.879823727Z time="2016-12-28T20:56:50Z" level=debug msg="CREATE ingress/production/static-ingress" context=kubelego
2016-12-28T20:56:50.879895337Z time="2016-12-28T20:56:50Z" level=debug msg="CREATE ingress/production/ws-ingress" context=kubelego
2016-12-28T20:56:50.879981935Z time="2016-12-28T20:56:50Z" level=debug msg="worker: begin processing true" context=kubelego
2016-12-28T20:56:50.890369299Z time="2016-12-28T20:56:50Z" level=info msg="ignoring as has no annotiation 'kubernetes.io/tls-acme'" context=ingress name=kube-lego-nginx namespace=kube-lego
2016-12-28T20:56:50.890475469Z time="2016-12-28T20:56:50Z" level=debug msg=reset context=provider provider=gce
2016-12-28T20:56:50.893689251Z time="2016-12-28T20:56:50Z" level=debug msg=finialize context=provider provider=gce
2016-12-28T20:56:50.895528039Z time="2016-12-28T20:56:50Z" level=debug msg="setting up svc endpoint" context=provider namespace=production pod_ip=10.16.0.176 provider=gce
2016-12-28T20:56:50.936958122Z time="2016-12-28T20:56:50Z" level=debug msg=reset context=provider provider=nginx
2016-12-28T20:56:50.937079376Z time="2016-12-28T20:56:50Z" level=debug msg=finialize context=provider provider=nginx
2016-12-28T20:56:50.949346702Z time="2016-12-28T20:56:50Z" level=info msg="process certificates requests for ingresses" context=kubelego
2016-12-28T20:56:50.952419219Z time="2016-12-28T20:56:50Z" level=info msg="cert expires in 71.8 days, no renewal needed" context="ingress_tls" expire_time=2017-03-10 15:48:00 +0000 UTC name=ws-ingress namespace=production
2016-12-28T20:56:50.952523622Z time="2016-12-28T20:56:50Z" level=info msg="no cert request needed" context="ingress_tls" name=ws-ingress namespace=production
2016-12-28T20:56:50.955513389Z time="2016-12-28T20:56:50Z" level=info msg="cert expires soon so renew" context="ingress_tls" expire_time=2017-01-03 14:39:00 +0000 UTC name=api-gce-ingress namespace=production
2016-12-28T20:56:50.955619778Z time="2016-12-28T20:56:50Z" level=info msg="requesting certificate for api.XXXXX.com" context="ingress_tls" name=api-gce-ingress namespace=production
2016-12-28T20:56:51.712482350Z time="2016-12-28T20:56:51Z" level=debug msg="testing reachablity of http://api.XXXXX.com/.well-known/acme-challenge/_selftest" context=acme domain=api.XXXXX.com
And I think the GCE LB config is fine, as you can see in my first message it's definitely kube-lego serving the acme-challenge path
Wait a second ... I think I've found the issue. Just noticed in the debug output that the reachability test is done over http (which I guess makes sense, unless it could be done over https with a temporary dummy certificate somehow ?). And I had only enabled https on this load balancer... GCE does return a 404 for the http acme URL.
Case closed then, although it would be nice not to have to keep http open just for kube-lego (unless the ACME protocol makes it unavoidable)
That's weird. I'm having a similar issue except when I manually check the challenge path I get a good response. Any more ideas here?
UPDATE: I deleted the kube-lego pod and when the new pod came up it immediately issued a cert for my domain. Maybe some DNS cache that needed to clear??