kube-lego icon indicating copy to clipboard operation
kube-lego copied to clipboard

Failed to obtain certificate when use GCE

Open ranhsd opened this issue 8 years ago • 16 comments

Hi , I experience some issue when i am trying to obtain certificate. Those issues occurs only when i try to use gce LoadBalancer. when i am trying the same with nginx everything works well.

I am doing the following:

  1. Deploy kube-lego to its own namespace
  2. Deploy all other resources (services, deployments and tls)

my TLS looks like that:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: my-server
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "gce"
spec:
  tls:
  - hosts:
    - devel.******.mobi
    secretName: my-server-tls  
  rules:
  - host: devel.******.mobi
    http:
      paths:
      - path: /
        backend:
          serviceName: my-server
          servicePort: 80 

the ****** is for hiding the domain ...

  1. After i have an IP address of the load balancer i go to my DNS manager and point devel.******.mobi to it.

I see that kube-lego service is trying to populate the certificate and the error that i am getting is:

2017-01-12T15:14:24.145268161Z time="2017-01-12T15:14:24Z" level=info msg="requesting certificate for devel.******.mobi" context="ingress_tls" name=my-server namespace=dev 
2017-01-12T15:15:43.558005415Z time="2017-01-12T15:15:43Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: Get http://devel.******.mobi/.well-known/acme-challenge/_selftest: dial tcp: lookup devel.******.mobi on 10.39.240.10:53: no such host" context=acme domain=devel.******.mobi 
2017-01-12T15:15:43.558123998Z time="2017-01-12T15:15:43Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego 
2017-01-12T15:15:43.562858604Z time="2017-01-12T15:15:43Z" level=info msg="ignoring as has no annotiation 'kubernetes.io/tls-acme'" context=ingress name=kube-lego-nginx namespace=kube-lego 

and also my LoadBalancer looks fine:

gce-console

So i really don't know what is the issue ... Can you please try to help me to figure it out?

thanks in advanced.

ranhsd avatar Jan 12 '17 15:01 ranhsd

I think the kube-lego ingress is attempting to use nginx as the controller. Unless you specify it, the LEGO_DEFAULT_INGRESS_CLASS is nginx, and this breaks everything unless you also have an nginx controller available

Draiken avatar Jan 19 '17 15:01 Draiken

Thanks @Draiken for your recommendation, I tried to set LEGO_DEFAULT_INGRESS_CLASS to gce, but it didn't solve the issue. Do you have a more detailed way to fix it? Thanks!

pierreozoux avatar Jan 30 '17 11:01 pierreozoux

@pierreozoux sorry but in my case, since it worked, that was as far as I got.

Something cannot be reached from kube-lego. I assume you should debug that. If the route is correctly going to the correct backend, if the backend goes to kube-lego, etc.

At last if I couldn't find the problem, I'd try nuking the whole thing and try again from scratch.

Draiken avatar Jan 30 '17 12:01 Draiken

I faced this too. What I see is kube-lego-gce service does not have selector:

➜  ~ kubectl describe service kube-lego-gce
Name:			kube-lego-gce
Namespace:		stage
Labels:			<none>
Selector:		<none>

Though it works with node port (curl nodeip:nodeport/.well-known/acme-challenge/_selftest), it does not work through gce ingress/balancer (curl mydomain.com/.well-known/acme-challenge/_selftest). If I create another service with selector for kube-lego pod, it works through ingress/balancer.

vmakhaev avatar Feb 25 '17 16:02 vmakhaev

There is some open issue on it?

ranhsd avatar Feb 26 '17 07:02 ranhsd

For me it started to work after I used this configuration for Ingress.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: mysite
  namespace: default
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "gce"
spec:
  tls:
  - hosts:
    - www.mysite.com
    secretName: mysite-tls
  backend:
    serviceName: ghost
    servicePort: 80
  rules:
  - host: www.mysite.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: ghost
          servicePort: 80
  - host: www.mysite.com
    http:
      paths:
      - path: /.well-known/acme-challenge
        backend:
          serviceName: kube-lego-gce
          servicePort: 8080

camilb avatar Apr 02 '17 22:04 camilb

#132 fix this, besides as @camilb ⬆️ posted the same solution

gianrubio avatar Apr 04 '17 14:04 gianrubio

Where is the kube-lego-gce service coming from? It's not defined in the example right?

jamesthompson avatar Apr 19 '17 20:04 jamesthompson

@jamesthompson AFAIK it's dynamically created by the kube-lego pod, depending on what you specified as the ingress type.

Draiken avatar Apr 20 '17 11:04 Draiken

Still not working for me. Now i get 502 error

ranhsd avatar Apr 22 '17 19:04 ranhsd

@ranhsd 502 is coming from gce ingress, could you update your gce ingress to the latest release kube-lego to canary version and try again?

ps. Just to let you know that this is related to gce ingress and not kube-lego

gianrubio avatar May 02 '17 07:05 gianrubio

@gianrubio How do I update my GCE ingress to the latest release?

jamesthompson avatar May 02 '17 14:05 jamesthompson

Hi, finally i've managed to solve the issue. I had multiple issues that i needed to solve:

  1. In my ingress file i added both the /* and /.well-known/acme-challenge to the paths so at the end my ingress file looks like the following:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ******-server
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "gce" 
spec:
  tls:
  - hosts:
    - beta.******.****
    secretName: ******-server-key
  rules:
  - host: beta.******.****
    http:
      paths:
      - path: /*
        backend:
          serviceName: ******-server
          servicePort: 80 
      - path: /.well-known/acme-challenge
        backend:
          serviceName: kube-lego-gce
          servicePort: 8080   
  1. When you create GCE load balancer there is an auto health check that is done and it looks like that 200 http code must be returned by the health check and in my case when someone is trying to access to root URL of my api (beta.**.) without sending an API key then it will fail with 403 (Forbidden) what i did is to point the health check into another endpoint which was created especially for this purpose. This change should be done on the deployment.yml file so in my case it looks like the following:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ******-server
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: ******-server 
        app: ******-server
        tier: backend
        role: application-server
    spec:
      volumes: 
      - name: ****-keys
        secret:
          secretName: ****-keys
      - name: ******-server-config
        configMap:
          name: ******-server-config        
      containers:
      - image: gcr.io/******-****/**-server:v0.9
        imagePullPolicy: Always 
        name: ******-server 
        command: ["npm","start",--,]
        args: ["/etc/config/******-server-config.json"]             
        volumeMounts: 
          - name: ******-server-config
            mountPath: "/etc/config"
            readOnly: true     
          - name: push-keys
            mountPath: "/etc/keys/*****/***"    
        ports:
        - containerPort: 1337       
        readinessProbe:
          httpGet:
            path: /health
            port: 1337
          initialDelaySeconds: 5
          timeoutSeconds: 1  


Please note to the readinessProbe section there i make sure that the health check is done in front of the /health endpoint. (the default endpoint is of course the root endpoint)

Now everything works very well for me. I think we should add it to the docs so other users will know how it should be done... what do you think?

Thanks!

ranhsd avatar May 04 '17 03:05 ranhsd

I cannot reproduce this issue, except that I got some messages from the apiserver. Gce has a limit of 5 backend services . I'm not sure if it's related, but I guess it's good to share here for reference.

$ kubectl get events -w  --all-namespaces

[echoserver   2017-05-10 23:41:48 +0200 CEST   2017-05-10 23:35:18 +0200 CEST
   19        echoserver   Ingress             Warning   GCE :Quota   
{loadbalancer-controller }   googleapi: Error 403: Quota 'BACKEND_SERVICES' exceeded. 
Limit: 5.0, quotaExceeded
echoserver   2017-05-10 23:42:45 +0200 CEST   2017-05-10 23:35:18 +0200 CEST
   20        echoserver   Ingress             Warning   GCE :Quota   
{loadbalancer-controller }   googleapi: Error 403: Quota 'BACKEND_SERVICES' exceeded. 
Limit: 5.0, quotaExceeded
echoserver   2017-05-10 23:44:51 +0200 CEST   2017-05-10 23:35:18 +0200 CEST
   21        echoserver   Ingress             Warning   GCE :Quota   
{loadbalancer-controller }   googleapi: Error 403: Quota 'BACKEND_SERVICES' exceeded. 
Limit: 5.0, quotaExceeded

gianrubio avatar May 10 '17 21:05 gianrubio

I'm having this same problem. In my deployment.yaml I setup a readinessProbe for my app:

readinessProbe:
          httpGet:
              path: /login
              port: 8080

The problem here is the auto health check kube-lego creates adds /login when it really needs just "/"

ghost avatar Aug 04 '17 18:08 ghost

I've found the same issue causing the GLBC Health Check to fail (and therefore it is not forwarding any traffic to the kube-lego-gce service).

This will show up in the kube-lego DEBUG Log as:

error while authorizing: reachability test failed: wrong status code '502'

futuretec avatar Dec 21 '17 10:12 futuretec