kube-lego icon indicating copy to clipboard operation
kube-lego copied to clipboard

Incorrect response handling

Open sysradium opened this issue 8 years ago • 10 comments

We use kube2sky for DNS inside our K8S cluster. It was down for some reason. but I haven't noticed that. When I created an ingress which required a certificate kube-lego did not handle the DNS problem quite well:

time="2016-12-09T17:16:54Z" level=info msg="requesting certificate for domain.foo.com" context="ingress_tls" name=https namespace="ece8168e-f5c7-4f41-9469-702f1eb2e4ec"
time="2016-12-09T17:16:54Z" level=info msg="creating new secret" context=secret name=kube-lego-account namespace="a12ddb33-13d4-43e4-9cfe-bf7e5b90935d"
time="2016-12-09T17:17:39Z" level=info msg="creating new secret" context=secret name=https namespace="ece8168e-f5c7-4f41-9469-702f1eb2e4ec"
time="2016-12-09T17:17:39Z" level=error msg="Error while process certificate requests: Secret \"https\" is invalid: [data[tls.crt]: Required value, data[tls.key]: Required value]" context=kubelego

Looks like if DNS is down kubelego fails silently and then tries to create an empty secret.

sysradium avatar Dec 09 '16 17:12 sysradium

I am experiencing similar behaviour when inspecting the logs of the kube-lego container, although I am not using kube2sky, as far as I know.

I notice that the kube-lego-account is also not automatically created by kube-lego.

My current setup is:

[nrocco@metal ~]$ cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)

[nrocco@metal ~]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:52:01Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

I am running kubernetes on bare metal infrastructure.

In my previous setup (using kubernetes 1.4.6) kube-lego was working perfectly.

From the logs I am unable to determine if it is related to kubernetes 1.5.1 or acme, or both.

nrocco avatar Dec 18 '16 08:12 nrocco

After further troubleshooting I found that the kube-lego container was not able to connect to the outside world due to missing iptables rules.

Docker was started using the flags

--iptables=false --ip-masq=false

as suggested by kubernetes.

After destroying my cluster and recreating it I can confirm that kube-lego works like a charm.

Sorry for the noise.

nrocco avatar Dec 18 '16 12:12 nrocco

@nrocco thanks for letting me know. I was planning to upgrade my cluster to 1.5 this evening. If there would be a major problem with 1.5 it would be P0.

@sysradium: if you could raise the log level to debug and upload the ingress yamls somewhere it would help to dive a bit deeper on this

simonswine avatar Dec 18 '16 13:12 simonswine

@simonswine I'm seeing similar issues (note, though, that I'm new to the whole kubernetes thing so might have screwed up all sorts of things - that said, all the basic bits and pieces, except TLS, seem to work).

The logs (with debug) of the kube-lego pod say:

time="2016-12-22T15:34:13Z" level=info msg="kube-lego 0.1.3-d425b293 starting" context=kubelego
time="2016-12-22T15:34:13Z" level=info msg="connected to kubernetes api v1.5.1" context=kubelego
time="2016-12-22T15:34:13Z" level=debug msg="start watching ingress objects" context=kubelego
time="2016-12-22T15:34:13Z" level=info msg="server listening on http://:8080/" context=acme
time="2016-12-22T15:34:13Z" level=debug msg="CREATE ingress/default/knitpick-blog" context=kubelego
time="2016-12-22T15:34:13Z" level=debug msg="worker: begin processing true" context=kubelego
time="2016-12-22T15:34:13Z" level=info msg="ignoring as has no annotiation 'kubernetes.io/tls-acme'" context=ingress name=echomap namespace=default
time="2016-12-22T15:34:13Z" level=info msg="ignoring as has no annotiation 'kubernetes.io/tls-acme'" context=ingress name=kube-lego-nginx namespace=default
time="2016-12-22T15:34:13Z" level=debug msg=reset context=provider provider=gce
time="2016-12-22T15:34:13Z" level=debug msg=finialize context=provider provider=gce
time="2016-12-22T15:34:13Z" level=debug msg=reset context=provider provider=nginx
time="2016-12-22T15:34:13Z" level=debug msg=finialize context=provider provider=nginx
time="2016-12-22T15:34:13Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2016-12-22T15:34:13Z" level=info msg="creating new secret" context=secret name=knitpick-tls namespace=default
time="2016-12-22T15:34:13Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=knitpick-blog namespace=default
time="2016-12-22T15:34:13Z" level=info msg="requesting certificate for knitpick.me,www.knitpick.me" context="ingress_tls" name=knitpick-blog namespace=default
time="2016-12-22T15:34:13Z" level=info msg="creating new secret" context=secret name=kube-lego-account namespace=default
time="2016-12-22T15:34:33Z" level=info msg="creating new secret" context=secret name=knitpick-tls namespace=default
time="2016-12-22T15:34:33Z" level=error msg="Error while process certificate requests: Secret \"knitpick-tls\" is invalid: [data[tls.crt]: Required value, data[tls.key]: Required value]" context=kubelego

The ingress yaml is:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: knitpick-blog
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: nginx

spec:
  tls:
    - secretName: knitpick-tls
      hosts:
        - knitpick.me
        - www.knitpick.me

  rules:
    - host: knitpick.me
      http:
        paths:
          - path: /
            backend:
              serviceName: knitpick-blog
              servicePort: 80

    - host: www.knitpick.me
      http:
        paths:
          - path: /
            backend:
              serviceName: knitpick-blog
              servicePort: 80

The only thing I've changed from the example lego deployment is that I've changed it to use the default namespace (instead of creating its own), changed the email, and changed the URL to use the staging API (that said, I also tried the production API but it didn't change a thing).

Also note that verification here would fail, as the actual domain names still point elsewhere - but: it's not even getting there - I don't see any challenge requests on the actual webserver for those domains.

Note, also, that the secret doesn't exist initially, as I'm trying to set it all up from scratch. From the description of kube-lego in the README, it sounds like it should be able to cope with this and create the secret from scratch.

Similarly, I don't see any secret being created, not even the 'kube-lego-account' secret it claims to be creating - kubectl get secrets doesn't show either of them.

Any ideas?

bwalex avatar Dec 22 '16 15:12 bwalex

Ok, that was an embarrassing pilot error - networking was somewhat messed up.

However, it really doesn't help that, when kube-lego can't connect to the outside world it instead gives some random message about not being able to create an (empty?) secret.

bwalex avatar Dec 22 '16 21:12 bwalex

Yeah, I guess it shouldn't even try to :)

@simonswine sorry don't have the logs nearby, but I think @bwalex explained the problem enough for you to find the bad place :) I guess it isn't that far away.

I wanted to fix that myself if the problem comes back but it never tit 😁

sysradium avatar Dec 23 '16 00:12 sysradium

I have the same error on GCE, deploy using helm chart https://github.com/kubernetes/charts/blob/master/stable/nginx-lego

time="2016-12-09T17:17:39Z" level=error msg="Error while process certificate requests: Secret \"https\" is invalid: [data[tls.crt]: Required value, data[tls.key]: Required value]" context=kubelego

And actually can not see how to test the connectivity of the container because kubectl exec doesn't work with lego pod.

Any advice?

wclr avatar Jan 25 '17 21:01 wclr

I'm getting exactly the same error as @whitecolor

DavidSporer avatar Jan 26 '17 06:01 DavidSporer

For people who find this in the future, this is often b/c your KUBE_URL or KUBE_EMAIL variables in the helm chart are wrong.

brendandburns avatar May 05 '17 19:05 brendandburns

@brendandburns Do you have any suggestions for how to diagnose that? How would you find that it was related to KUBE_URL and KUBE_EMAIL?

hyperbolic2346 avatar May 06 '17 20:05 hyperbolic2346