ingress-nginx icon indicating copy to clipboard operation
ingress-nginx copied to clipboard

Tolerance to API unavailability

Open zerkms opened this issue 4 years ago • 29 comments

At the moment if the API becomes unavailable (for any reason, eg outage) it takes about a minute or so before ingress-nginx unregisters all backends with the following message:

W0605 01:56:10.166870       9 controller.go:909] Service "ns/svc" does not have any active Endpoint.

Even though the endpoints are still up, running, and are perfectly available.

Would it not be better if ingress-nginx was running with the same set of endpoints while API is unavailable?

Is there currently another issue associated with this? Could not find

Does it require a particular kubernetes version? No

/kind feature

zerkms avatar Jun 05 '20 02:06 zerkms

We are observing a similar pattern of behaviour.

Disruption to the Api server causes instability in the nginx pods, regardless of the overall state of the cluster.

Some background, we are undergoing the process of rotating some of the secrets used by the Kube Api server because the client certificates will expire. To acheive this we need to restart the api servers, but when they restart the bearer tokens that were originally given by the service account are invalid.

Our current mitigation strategy for this is to stand up an api server and to use hostAlias to override the kubernetes, kubernetes.default, kubernetes.default.svc.cluster.local, etc, on the ingress pods. But this is a real chore, since restarting the nginx ingress can trigger disruption, and we need to restart the pods twice (once to apply the override hostname and once to remove it when we are done).

Is the duration of time before unregistration configurable?

AndrewNeudegg avatar Jun 09 '20 10:06 AndrewNeudegg

@zerkms thanks for bringing this up. I think the controller should not consider the the list of endpoints to be empty when the API server is down. Can you write an e2e test showing this? Then we can think how to solve it.

cc @aledbf I think this is critical for availability.

ElvinEfendi avatar Jun 11 '20 02:06 ElvinEfendi

Let's separate the issues. In case of unavailability of the API server, you will see something like this:

I0611 02:56:30.099619       6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
I0611 02:56:30.099975       6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
I0611 02:56:30.100256       6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
I0611 02:56:30.100260       6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
I0611 02:56:30.100287       6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
I0611 02:56:30.100490       6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
E0611 02:56:30.101137       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=8m52s&timeoutSeconds=532&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
W0611 02:56:30.100961       6 reflector.go:404] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:161: watch of *v1.Pod ended with: very short watch: k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:161: Unexpected watch close - watch lasted less than a second and no items received
W0611 02:56:30.101041       6 reflector.go:404] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:182: watch of *v1beta1.Ingress ended with: very short watch: k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:182: Unexpected watch close - watch lasted less than a second and no items received
W0611 02:56:30.101399       6 reflector.go:404] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:157: watch of *v1.Secret ended with: very short watch: k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:157: Unexpected watch close - watch lasted less than a second and no items received
E0611 02:56:30.101450       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=9m24s&timeoutSeconds=564&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:30.102176       6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:161: Failed to list *v1.Pod: Get "http://0.0.0.0:8001/api/v1/namespaces/invalid-namespace/pods?resourceVersion=9069652": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:30.102191       6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:182: Failed to list *v1beta1.Ingress: Get "http://0.0.0.0:8001/apis/networking.k8s.io/v1beta1/ingresses?resourceVersion=2862637": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:30.102592       6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:157: Failed to list *v1.Secret: Get "http://0.0.0.0:8001/api/v1/secrets?resourceVersion=9070770": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:30.102835       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=8m2s&timeoutSeconds=482&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:31.102165       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=6m58s&timeoutSeconds=418&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:31.103086       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m11s&timeoutSeconds=491&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:31.107797       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=8m24s&timeoutSeconds=504&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:31.743855       6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:161: Failed to list *v1.Pod: Get "http://0.0.0.0:8001/api/v1/namespaces/invalid-namespace/pods?resourceVersion=9069652": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:31.947330       6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:157: Failed to list *v1.Secret: Get "http://0.0.0.0:8001/api/v1/secrets?resourceVersion=9070770": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:32.010309       6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:182: Failed to list *v1beta1.Ingress: Get "http://0.0.0.0:8001/apis/networking.k8s.io/v1beta1/ingresses?resourceVersion=2862637": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:32.103217       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=6m54s&timeoutSeconds=414&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:32.104147       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m58s&timeoutSeconds=538&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:32.108573       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=6m49s&timeoutSeconds=409&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:33.104284       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=9m41s&timeoutSeconds=581&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:33.105079       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=9m0s&timeoutSeconds=540&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:33.109347       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=5m55s&timeoutSeconds=355&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:34.104820       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=7m18s&timeoutSeconds=438&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:34.105892       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=5m41s&timeoutSeconds=341&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:34.110270       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=7m29s&timeoutSeconds=449&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:35.105586       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=8m23s&timeoutSeconds=503&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:35.106736       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=7m53s&timeoutSeconds=473&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:35.111164       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=5m5s&timeoutSeconds=305&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:35.646766       6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:161: Failed to list *v1.Pod: Get "http://0.0.0.0:8001/api/v1/namespaces/invalid-namespace/pods?resourceVersion=9069652": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:36.069955       6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:182: Failed to list *v1beta1.Ingress: Get "http://0.0.0.0:8001/apis/networking.k8s.io/v1beta1/ingresses?resourceVersion=2862637": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:36.106781       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=9m10s&timeoutSeconds=550&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:36.107470       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m12s&timeoutSeconds=492&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:36.112088       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=9m20s&timeoutSeconds=560&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:36.431501       6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:157: Failed to list *v1.Secret: Get "http://0.0.0.0:8001/api/v1/secrets?resourceVersion=9070770": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:37.107723       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=7m2s&timeoutSeconds=422&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:37.108695       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=7m59s&timeoutSeconds=479&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:37.112833       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=5m14s&timeoutSeconds=314&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:38.108760       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=6m19s&timeoutSeconds=379&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:38.109533       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m10s&timeoutSeconds=490&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:38.113538       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=9m13s&timeoutSeconds=553&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:38.294058       6 leaderelection.go:320] error retrieving resource lock invalid-namespace/ingress-controller-leader-nginx: Get "http://0.0.0.0:8001/api/v1/namespaces/invalid-namespace/configmaps/ingress-controller-leader-nginx": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:39.109905       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=8m47s&timeoutSeconds=527&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:39.110554       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m25s&timeoutSeconds=505&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:39.114372       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=8m7s&timeoutSeconds=487&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:40.110786       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=5m22s&timeoutSeconds=322&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:40.111850       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m44s&timeoutSeconds=524&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:40.115094       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=6m9s&timeoutSeconds=369&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:41.111831       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=5m14s&timeoutSeconds=314&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:41.112632       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=6m26s&timeoutSeconds=386&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:41.115883       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=9m51s&timeoutSeconds=591&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:42.112808       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=9m9s&timeoutSeconds=549&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:42.113817       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m59s&timeoutSeconds=539&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:42.116610       6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=5m11s&timeoutSeconds=311&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused

When the API server is available again, you will not see such an output.

aledbf avatar Jun 11 '20 03:06 aledbf

Would it not be better if ingress-nginx was running with the same set of endpoints while API is unavailable?

When the API server is not available, there are no changes in the nginx model. This means if some of the pods running your application dies, the ingress controller will not be aware of such an event and it will try to send traffic to the pod.

aledbf avatar Jun 11 '20 03:06 aledbf

Some background, we are undergoing the process of rotating some of the secrets used by the Kube Api server because the client certificates will expire. To acheive this we need to restart the api servers, but when they restart the bearer tokens that were originally given by the service account are invalid.

There is no support in client-go to "reload" the service accounts. You need to restart the pod. Not sure if there is any support for this scenario

aledbf avatar Jun 11 '20 03:06 aledbf

When the API server is not available, there are no changes in the nginx model. This means if some of the pods running your application dies, the ingress controller will not be aware of such an event and it will try to send traffic to the pod.

I'm not sure I can understand it.

When API server is not available - ingress-nginx stops sending traffic anywhere because it believes endpoints have died (while they are still available).

zerkms avatar Jun 11 '20 03:06 zerkms

When API server is not available - ingress-nginx stops sending traffic anywhere because it believes endpoints have died (while they are still available).

Please post the logs of the ingress controller pod when this happens.

aledbf avatar Jun 12 '20 02:06 aledbf

It outputs these lines (one per service, only one shown) when it happens:

W0612 02:25:19.114102       6 controller.go:909] Service "rook-ceph/rook-ceph-mgr-dashboard" does not have any active Endpoint.

Yet, if I try to request this service from the cli:

# curl -v 10.53.185.114:7000
* Rebuilt URL to: 10.53.185.114:7000/
*   Trying 10.53.185.114...
* TCP_NODELAY set
* Connected to 10.53.185.114 (10.53.185.114) port 7000 (#0)
> GET / HTTP/1.1
> Host: 10.53.185.114:7000
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Length: 1162
< Content-Language: en-US
< Accept-Ranges: bytes
< Vary: Accept-Language, Accept-Encoding
< Server: CherryPy/3.2.2
< Last-Modified: Mon, 02 Mar 2020 17:55:01 GMT
< Date: Fri, 12 Jun 2020 02:28:47 GMT
< Content-Type: text/html;charset=utf-8
< 
<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>Ceph</title>

It responds successfully

zerkms avatar Jun 12 '20 02:06 zerkms

We had to roll our masters and the IPs in our Kubernetes API got stale due to a DNS issue. Ingress nginx couldn't talk to the API and then it cleared out all of the backends.

djagannath-asapp avatar Jul 22 '20 00:07 djagannath-asapp

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Oct 20 '20 00:10 fejta-bot

/remove-lifecycle stale

zerkms avatar Oct 20 '20 00:10 zerkms

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jan 18 '21 00:01 fejta-bot

/remove-lifecycle stale

zerkms avatar Jan 18 '21 20:01 zerkms

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Apr 18 '21 20:04 fejta-bot

/remove-lifecycle stale

zerkms avatar Apr 18 '21 22:04 zerkms

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jul 17 '21 23:07 fejta-bot

/remove-lifecycle stale

zerkms avatar Jul 18 '21 08:07 zerkms

/assign @rikatz

strongjz avatar Aug 17 '21 16:08 strongjz

/lifecycle active

strongjz avatar Aug 17 '21 16:08 strongjz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 15 '21 17:11 k8s-triage-robot

/remove-lifecycle stale

manzsolutions-lpr avatar Nov 15 '21 18:11 manzsolutions-lpr

/lifecycle frozen

rikatz avatar Dec 06 '21 19:12 rikatz

/triage accepted /priority important-longterm

iamNoah1 avatar Dec 15 '21 10:12 iamNoah1

This seems that after splitting the control plane from datplane something that will be doable.

Control Plane will keep an state of the current configuration. Everytime dataplane gets it, the same config will be there.

The con of this is that still we cannot deal with interruptions of kubernetes api server, but now we can think on another approach like

Data Plane -gRPC-> Kubernetes Service -> control plane (1-N) -> Kubernetes API Server (1-N)

rikatz avatar Dec 30 '21 00:12 rikatz

/help-wanted

strongjz avatar Feb 15 '22 17:02 strongjz

/help

iamNoah1 avatar Feb 15 '22 17:02 iamNoah1

@iamNoah1: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 15 '22 17:02 k8s-ci-robot

Also seeing this across some AKS clusters that seem to sometimes briefly (~ a few minutes) lose connectivity to the control plane resulting in the same Service "zzz" does not have any active Endpoint. messages.

This manifests as 503 errors being served when hitting an Ingress URL until connection with the control plane is once again resumed and the appropriate endpoints populated again.

essh avatar May 04 '22 03:05 essh

This will also happen if you have a service pointing to a deleted deployment. The endpoints are effectively not there and the logs Service "x" does not have any active Endpoint. starts to appear. Nginx will become very unstable, throwing 503 left and right. I know we should not have service pointing to deleted deployment but hey, this happened to us :(

nvanheuverzwijn avatar Jul 28 '22 15:07 nvanheuverzwijn

We also hit this issue on GKE, although I'm confused what's going on because the Nginx Ingress Controller nginx.conf generation logic seems to discover all Kubernetes resources through informer caches so I would expect an API server disconnect to just prevent updates (rather than wiping all existing endpoints).

Could the informers be clearing their local cache after a disconnect?

dippynark avatar Nov 22 '22 13:11 dippynark