kube-cert-manager icon indicating copy to clipboard operation
kube-cert-manager copied to clipboard

`Error while watching kubernetes events: json: cannot unmarshal string into Go value of type v1beta1.IngressStatus`

Open nelhage opened this issue 7 years ago • 10 comments

I've got a kube-cert-manager instance running in GKE using the configuration available at https://github.com/livegrep/livegrep.com which seems to be stuck; I have two certs approaching expiry, and the kcm isn't updating them, and the logs are full of spew like this:

2017/05/08 21:02:33 Error while watching kubernetes events: json: cannot unmarshal string into Go value of type v1beta1.IngressStatus

I tried bumping to 0.4.0 (built from source) with no success. Any suggestions or debugging advice?

nelhage avatar May 09 '17 15:05 nelhage

Which version of kubernetes are you running ?

luna-duclos avatar May 09 '17 15:05 luna-duclos

Do you have an ingress configured manually through some other tool somewhere ? I can't spot an ingress in your config but that seems to be what kcm is chocking on

luna-duclos avatar May 09 '17 15:05 luna-duclos

Also do ensure your version of kubernetes and the kubectl proxy container for kcm match

luna-duclos avatar May 09 '17 15:05 luna-duclos

The ingresses are configured here:

https://github.com/livegrep/livegrep.com/blob/master/kubernetes/frontend.yaml#L82-L111

I'm on 1.6.2. Do the kubectl-proxy tags correspond to k8s versions? Is there a prebuilt place I can grab a 1.6.2 build?

nelhage avatar May 09 '17 16:05 nelhage

See the Dockerfile for the proxy here, it's really tiny and you should be able to build one for 1.6.2 easily: https://github.com/PalmStoneGames/kube-cert-manager/blob/v0.3.1/kubectl-proxy/Dockerfile

luna-duclos avatar May 09 '17 16:05 luna-duclos

Cool, I'll give that a try probably this evening when I'm at the right computer.

It'd be nice if there were a way for this setup to flag version mismatches and fail gracefully.

nelhage avatar May 09 '17 16:05 nelhage

First recommendation I'd have is to check if you have any extra ingresses, if so delete them. And otherwise, recreate the ones you have.

luna-duclos avatar May 09 '17 18:05 luna-duclos

I added some logging to the kube-cert-manager pod, and also upgraded to a newer kubectl-proxy. The upgrade didn't resolve it, but the logging shed some light on what's happening:

with this diff:

diff --git a/k8s.go b/k8s.go
index cc1e5ad..f223c41 100644
--- a/k8s.go
+++ b/k8s.go
@@ -463,6 +463,7 @@ func monitorIngressEvents(endpoint string) (<-chan IngressEvent, <-chan error) {
                                event.Type = ev.Type
                                err := json.Unmarshal([]byte(ev.Object), &event.Object)
                                if err != nil {
+                                       log.Printf("unmarshal failed ev=%q", ev.Object)
                                        errc <- err
                                        continue
                                }

we get

2017/05/10 04:04:31 unmarshal failed ev="{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"too old resource version: 6669294 (9453598)\",\"reason\":\"Gone\",\"code\":410}"
2017/05/10 04:04:31 Error while watching kubernetes events: json: cannot unmarshal string into Go struct field Ingress.status of type v1beta1.IngressStatus

Upon further investigation, though, the certs are still >1w from expiry, which is k-c-m's cutoff, so it may in fact be recovering fine from those errors, and will renew in a few days.

Assuming that's the case, I think there may still be two arguable bugs, and I'm happy to open separate tickets if desired:

  1. That periodic error spew in the logs is confusing
  2. It appears that letsencrypt emails about soon-to-be-expired certs at something like 9 and 19 days-to-expiry; likely k-c-m should renew by default before 19 days, in order to preempt the operator from receiving the reminder email.

nelhage avatar May 10 '17 04:05 nelhage

The certs in question just passed 1w to expiry, and were renewed correctly. So I think everything is working normally, although the issues outlined in https://github.com/PalmStoneGames/kube-cert-manager/issues/73#issuecomment-300369154 are still valid in my judgment.

nelhage avatar May 11 '17 02:05 nelhage

I have noted these errors message too, in a 1.5 cluster.

whereisaaron avatar Dec 23 '17 16:12 whereisaaron