kube-lego
kube-lego copied to clipboard
Support allow-http: "false"
kube-lego requires port 80 to be open to verify server reachability. However, as far as I understand #164 , it is not required by Let's Encrypt. For instance, you could use a self-signed certificate to kick-off the cluster.
You can seehttp schema hard-coded in the reachability test ( https://github.com/jetstack/kube-lego/blob/master/pkg/acme/cert_request.go#L48 ).
Expected behaviour:
- HTTPS (port 443) to be used and a self-signed certificate being ignored.
Why it's an issue:
- You cannot use
kubernetes.io/ingress.allow-http: "false"(not at first, not later). It's important if you want to avoid accidental use of HTTP.
How I found it:
1 - Created a TLS secret with a self-signed certificate.
2 - Created a new Ingress with kubernetes.io/ingress.allow-http: "false".
3 - Created kube-lego deployment.
4 - Noticed errors in the kube-lego logs: authorization failed after 1m0s: reachabily test failed: wrong status code '404'.
5 - Removed kubernetes.io/ingress.allow-http: "false" from the Ingress definition.
6 - Soon errors changed to authorization failed after 1m0s: reachabily test failed: wrong status code '502'.
7 - Eventually it passed and new certificate was issued.
Related: #164. Which ingress are you using (nginx or gce)? I am using gce and have been able to deploy an ingress that only has port 403 open and use HTTP-01 challenge (doesn't require a valid certificate) and kube-lego was able to set up the challenge endpoint in domain:443/.well-known/....
The gotcha is that gce ingress has a bug that you have to start with allow-http:false, adding it later on doesn’t do anything.
I used GCE Load balancer. Will later check again with a new cluster.
Ok, so I finally tested it again. Took me a while. Still have no idea why ingress didn't create HTTPS front-end automatically. Despite that, this is gonna be a long message:
Essentially, I took GCE example from this repo and added a couple lines to the ingress. Shell I ran was (also note the comments to better understand what I did manually during the pauses):
set -ex
read -p "Press any key to start..." -sn1
gcloud config set project echo-test-166405
gcloud container clusters create echo-cluster --machine-type="g1-small" --num-nodes=1
echo "Cluster created"
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/C=LT"
kubectl create secret tls echoserver-tls --key /tmp/tls.key --cert /tmp/tls.crt
# Note: At this point I just waited a minute to make sure it's all OK.
echo "Self-generated certificate created"
read -p "Press any key to continue... " -sn1
kubectl apply -f echoserver/00-namespace.yaml
kubectl apply -f echoserver/deployment.yaml
kubectl apply -f echoserver/service.yaml
kubectl apply -f echoserver/ingress-tls.yaml
# I waited for a couple of minutes but fronted didn't get created so I created it manually and tested (will post an info snippet below)
echo "Echoserver is deployed (load balancer is expected to be created)"
read -p "Press any key to continue... " -sn1
kubectl apply -f lego/00-namespace.yaml
kubectl apply -f lego/deployment.yaml
echo "kube-lego created"
echo "Will now proxy Kubernetes UI"
kubectl proxy
# I waited a couple of minutes, didn't work then got the kube-lego logs (I will put them below).
echoserver namespace was almost unchanged, except for ingress:
apiVersion: v1
kind: Namespace
metadata:
name: echoserver
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: echoserver
namespace: echoserver
spec:
replicas: 1
template:
metadata:
labels:
app: echoserver
spec:
containers:
- image: gcr.io/google_containers/echoserver:1.0
imagePullPolicy: Always
name: echoserver
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: echoserver
namespace: echoserver
spec:
ports:
- port: 80
targetPort: 8080
protocol: TCP
type: NodePort
selector:
app: echoserver
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: echoserver
namespace: echoserver
annotations:
kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: "gce"
kubernetes.io/ingress.allow-http: "false"
spec:
tls:
- hosts:
- echo.pijusn.eu
secretName: echoserver-tls
rules:
- host: echo.pijusn.eu
http:
paths:
- backend:
serviceName: echoserver
servicePort: 80
Next is the kube-lego namespace, I only inlined configurations.
apiVersion: v1
kind: Namespace
metadata:
name: kube-lego
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-lego
namespace: kube-lego
spec:
replicas: 1
template:
metadata:
labels:
# Required for the auto-create kube-lego-nginx service to work.
app: kube-lego
spec:
containers:
- name: kube-lego
image: jetstack/kube-lego:0.1.3
imagePullPolicy: Always
ports:
- containerPort: 8080
env:
- name: LEGO_EMAIL
value: "[email protected]"
- name: LEGO_URL
value: "https://acme-v01.api.letsencrypt.org/directory"
- name: LEGO_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: LEGO_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 1
Here is the way I tested whether it's actually reachable from the outside:
➜ pijusn curl https://echo.pijusn.eu
curl: (60) SSL certificate problem: Invalid certificate chain
More details here: https://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
➜ pijusn curl https://echo.pijusn.eu -k
CLIENT VALUES:
client_address=('10.0.0.1', 65239) (10.0.0.1)
command=GET
path=/
real path=/
query=
request_version=HTTP/1.1
SERVER VALUES:
server_version=BaseHTTP/0.6
sys_version=Python/3.5.0
protocol_version=HTTP/1.0
HEADERS RECEIVED:
Accept=*/*
Connection=Keep-Alive
Host=echo.pijusn.eu
User-Agent=curl/7.51.0
Via=1.1 google
X-Cloud-Trace-Context=5918318b958ba30183510bd58efe02f5/14783633724735387003
X-Forwarded-For=85.206.179.15, 35.190.0.79
X-Forwarded-Proto=https
Finally, the logs (nothing spectacular, the same message is visible) of kube-lego after a couple of minutes (please note, I checked the logs before posting this and it still was failing):
time="2017-05-02T17:40:48Z" level=info msg="kube-lego 0.1.3-d425b293 starting" context=kubelego
time="2017-05-02T17:40:48Z" level=info msg="connected to kubernetes api v1.5.6" context=kubelego
time="2017-05-02T17:40:48Z" level=info msg="server listening on http://:8080/" context=acme
time="2017-05-02T17:40:48Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-05-02T17:40:48Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-05-02T17:40:48Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver
time="2017-05-02T17:40:48Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver
time="2017-05-02T17:40:48Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver
time="2017-05-02T17:40:48Z" level=info msg="creating new secret" context=secret name=kube-lego-account namespace=kube-lego
time="2017-05-02T17:40:49Z" level=info msg="if you don't accept the TOS (https://letsencrypt.org/documents/LE-SA-v1.1.1-August-1-2016.pdf) please exit the program now" context=acme
time="2017-05-02T17:40:49Z" level=info msg="created an ACME account (registration url: https://acme-v01.api.letsencrypt.org/acme/reg/13769191)" context=acme
time="2017-05-02T17:40:49Z" level=info msg="creating new secret" context=secret name=kube-lego-account namespace=kube-lego
time="2017-05-02T17:41:51Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=echo.pijusn.eu
time="2017-05-02T17:41:51Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego
time="2017-05-02T17:41:51Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-05-02T17:41:51Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-05-02T17:41:51Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver
time="2017-05-02T17:41:51Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver
time="2017-05-02T17:41:51Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver
time="2017-05-02T17:43:09Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=echo.pijusn.eu
time="2017-05-02T17:43:09Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego
time="2017-05-02T17:43:09Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-05-02T17:43:09Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-05-02T17:43:09Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver
time="2017-05-02T17:43:09Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver
time="2017-05-02T17:43:09Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver
time="2017-05-02T17:44:38Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=echo.pijusn.eu
time="2017-05-02T17:44:38Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego
time="2017-05-02T17:44:38Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-05-02T17:44:38Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-05-02T17:44:38Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver
time="2017-05-02T17:44:38Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver
time="2017-05-02T17:44:38Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver
time="2017-05-02T17:45:43Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=echo.pijusn.eu
time="2017-05-02T17:45:43Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego
time="2017-05-02T17:45:43Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-05-02T17:45:43Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-05-02T17:45:43Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver
time="2017-05-02T17:45:43Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver
time="2017-05-02T17:45:43Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver
time="2017-05-02T17:46:54Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=echo.pijusn.eu
time="2017-05-02T17:46:54Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego
time="2017-05-02T17:46:54Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-05-02T17:46:54Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-05-02T17:46:54Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver
time="2017-05-02T17:46:54Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver
time="2017-05-02T17:46:54Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver
Finally, here is screenshot of load-balancer right after I tested it with CURL (kube-lego still not deployed):
https://drive.google.com/file/d/0B18agqOTmBF5ckRCUmFCcC1kQ0E/view?usp=sharing
@pijusn I'm trying to understand why you needed to create a self-signed certificate yourself. kube-lego will obtain a certificate and save it on the secret on the ingress (even though it doesn't exist yet).
In your screenshot I'm seeing that the .well-known/* URL map is not established. It looks like something is going wrong.
Self-signed certificate is needed to establish TLS in the first place, isn't it? According to #164 it will be ignored by the certificate issuer (meaning it doesn't matter if you use valid or invalid one) but it's still needed. And what I am implying is that kube-lego should also support it because keeping port 80 open (even if you instantly drop the connection) is very prone for security-related bugs.
About the .well-known/* - screenshot was taken before the kube-lego was deployed. I should have taken another screenshot afterwards but there is such rule echo.pijusn.eu /.well-known/acme-challenge/* k8s-be-30842--586fb62a8db47e63. It also works if port 80 is opened (which does not affect routing rules).
This all fuss is about:
- Getting valid certificate for the first time without exposing port 80.
- Getting certificate extended without exposing port 80. (haven't tested but based on what I saw in the source code, it shouldn't work either).
The reason it doesn't work (as far as I understand) is because kube-lego makes ahttp request (port 80) only which is closed. In GCE terms "closed" means HTTP error 404:
➜ ~ curl -i http://echo.pijusn.eu
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Referrer-Policy: no-referrer
Content-Length: 1561
Date: Wed, 03 May 2017 04:05:22 GMT
<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 404 (Not Found)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>404.</b> <ins>That’s an error.</ins>
<p>The requested URL <code>/</code> was not found on this server. <ins>That’s all we know.</ins>
What's more, I think there would be yet another issue waiting - reachability test not ignoring invalid certificates but it's part of test-implement-repeat development cycle 😉
Looking at the source code (one I linked) I don't see how it could fallback to https because protocol is simply hard-coded. and there is no other "reachability test" around. And if kube-lego doesn't pass its own reachability test, it doesn't go any further and doesn't connect to Let's encrypt.
Do you understand what issue I am implying? You said that it should work. Could you link part of source code (test or main) which is responsible for falling back / switching to https for the reachability test? Maybe then I can figure out why it doesn't behave as expected in my case.
Self-signed certificate is needed to establish TLS in the first place, isn't it?
I was able to get it to work without this. Just delete the ingress and deploy with the allow-http annotation. It doesn't require additional configuration and things work out just fine.
Are you talking about creating ingress without the annotation and then (after certificate is created), deleting the ingress and deploying with the annotation? If so, will it be able to re-new the certificate since it will still be getting 404 during the reachability test if port 80 is not open?
@pijusn
Are you talking about creating ingress without the annotation and then (after certificate is created)
nope just delete the ingress and when you create it make sure you deploy with the annotation.
If so, will it be able to re-new the certificate since it will still be getting 404 during the reachability test if port 80 is not open?
if allow-http:false, it knows that it should port 443 instead of 80. just give it a try.
What is wrong with this ingress? Especially considering that it was deployed on a fresh cluster, in a fresh project? You saw the script. It's a full list of actions which is just following example in the repo.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: echoserver
namespace: echoserver
annotations:
kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: "gce"
kubernetes.io/ingress.allow-http: "false"
spec:
tls:
- hosts:
- echo.pijusn.eu
secretName: echoserver-tls
rules:
- host: echo.pijusn.eu
http:
paths:
- backend:
serviceName: echoserver
servicePort: 80
Or this (actual ingress I used originally for another cluster; created GCE load balancer with port 443 only; functioning) ? :
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
namespace: default
name: public-traffic
annotations:
kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: "gce"
kubernetes.io/ingress.allow-http: "false"
kubernetes.io/ingress.global-static-ip-name: "public-entry"
spec:
tls:
- hosts:
- bb.pijusn.eu
secretName: tls-bb-certificate
rules:
- host: bb.pijusn.eu
http:
paths:
- backend:
serviceName: public-gateway
servicePort: 8000
- path: /echo
backend:
serviceName: echo
servicePort: 8010
These two ingresses succeeded to function on GCE load balancer but kube-lego was still testing reachability through HTTP and not HTTPS!
With the later one (used on actual cluster), I did actually test lot's of variations, including deleting and re-creating it many times.
Did you look at the source code I linked in the first comment? Could you comment on why it contains hard-coded http and how it would be making HTTPS requests when needed?
No I haven't looked at the source code, I am not familiar with the source code. I am sharing my experience of how I got it to work. When asked #164 and got my answer, I deleted my ingress and redeployed with the allow-http:false annotation and everything worked for me smoothly. (This was for a domain name that had no certs issued before.)
Your manifests look fine to me. Maybe kube-lego is caching a resolved DNS record about your domain name (#162). Try deleting the kube-lego pod so that it gets rescheduled onto another node perhaps.
I see. I thought you are familiar with the source code. Sorry for the frustration. This is on me. I will look into the source code more later figuring out how to test and fix it.
I think what happened in your case:
- You deployed with HTTP enabled, got your certificate
- You deleted the ingress (certificate stays in the secrets) (kube-lego is just sleeping)
- Deployed with a new annotation, previous certificate was re-used (kube-lego is just sleeping).
Based on what I found in the source code, kube-lego does not pass it's internal test if HTTP is disabled meaning it will not refresh the certificate when it's about to expire. You should get a warning message from Let's Encrypt if that's the case, though. Keep everyone posted if that happens, will you? :wink:
@pijusn Perhaps you're right after all. I just tried this again and it didn’t work. I can’t even get Ingress to get a public IP with that annotation. The moment I drop allow-http annotation, I get an IP. Weird.
So in about 70 days we'll know what happens? :smile:
Seems like you need HTTP enabled at all times just in case. Perhaps the downstream app needs to check X-Forwarded-Proto in the headers and handle things (as kube-lego will take the acme challenge requests).
Following on from #218, I think the big issue here is with kube-lego's reachability test, and not with the underlying ACME implementation.
I think the action item from this is to make kube-lego aware of this annotation (and whatever other similar annotations there are for different ingress controllers), and switch the http scheme (https://github.com/jetstack/kube-lego/blob/master/pkg/acme/cert_request.go#L48) to https in this case, as well as allowing invalid certificates on the request.
The way we go about implementing this is non-trivial however, as it means littering the kube-lego codebase with yet another controller-specific hack. This leads us to really needing a policy on which hacks/annotations we will and won't support in kube-lego. There are already a large number of differing controller implementations, each with their own slightly different flavour of ingress specification. Until this situation is improved, I fear kube-lego will continue to bloat with difficult to maintain feature support.
@munnerz is there anything that can be done, even as a workaround, to bypass this situation?
kube-lego could allow for an config option to chose the desired reachability test to try.
lego.test: http
lego.test: https
lego.test: dns
...
I think it is important to support setups that need 'use-proxy-protocol' at IC level.
FYI: I hacked it by port forwarding port 80 to 8080 on the kube lego pod. It could be scripted into a job, but it sure as hell is hacky and not nice to maintain with regards to rbac and policies.
Why can’t it try 443 without testing for cert validity first? And then fallback to 80? And maybe even try dns first? No need to configure anything imo, as the preferred order would be dns,https,http anyway. No?
We also set kubernetes.io/ingress.allow-http: "false" and it caused cert expiration error ... We wan t to allow only HTTPS for our domain... So it's better if we can control acme test protocol.
Faced with the same issue after adding kubernetes.io/ingress.allow-http: "false"