ingress-nginx
ingress-nginx copied to clipboard
Tolerance to API unavailability
At the moment if the API becomes unavailable (for any reason, eg outage) it takes about a minute or so before ingress-nginx
unregisters all backends with the following message:
W0605 01:56:10.166870 9 controller.go:909] Service "ns/svc" does not have any active Endpoint.
Even though the endpoints are still up, running, and are perfectly available.
Would it not be better if ingress-nginx
was running with the same set of endpoints while API is unavailable?
Is there currently another issue associated with this? Could not find
Does it require a particular kubernetes version? No
/kind feature
We are observing a similar pattern of behaviour.
Disruption to the Api server causes instability in the nginx pods, regardless of the overall state of the cluster.
Some background, we are undergoing the process of rotating some of the secrets used by the Kube Api server because the client certificates will expire. To acheive this we need to restart the api servers, but when they restart the bearer tokens that were originally given by the service account are invalid.
Our current mitigation strategy for this is to stand up an api server and to use hostAlias to override the kubernetes
, kubernetes.default
, kubernetes.default.svc.cluster.local
, etc, on the ingress pods. But this is a real chore, since restarting the nginx ingress can trigger disruption, and we need to restart the pods twice (once to apply the override hostname and once to remove it when we are done).
Is the duration of time before unregistration configurable?
@zerkms thanks for bringing this up. I think the controller should not consider the the list of endpoints to be empty when the API server is down. Can you write an e2e test showing this? Then we can think how to solve it.
cc @aledbf I think this is critical for availability.
Let's separate the issues. In case of unavailability of the API server, you will see something like this:
I0611 02:56:30.099619 6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
I0611 02:56:30.099975 6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
I0611 02:56:30.100256 6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
I0611 02:56:30.100260 6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
I0611 02:56:30.100287 6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
I0611 02:56:30.100490 6 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
E0611 02:56:30.101137 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=8m52s&timeoutSeconds=532&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
W0611 02:56:30.100961 6 reflector.go:404] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:161: watch of *v1.Pod ended with: very short watch: k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:161: Unexpected watch close - watch lasted less than a second and no items received
W0611 02:56:30.101041 6 reflector.go:404] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:182: watch of *v1beta1.Ingress ended with: very short watch: k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:182: Unexpected watch close - watch lasted less than a second and no items received
W0611 02:56:30.101399 6 reflector.go:404] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:157: watch of *v1.Secret ended with: very short watch: k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:157: Unexpected watch close - watch lasted less than a second and no items received
E0611 02:56:30.101450 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=9m24s&timeoutSeconds=564&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:30.102176 6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:161: Failed to list *v1.Pod: Get "http://0.0.0.0:8001/api/v1/namespaces/invalid-namespace/pods?resourceVersion=9069652": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:30.102191 6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:182: Failed to list *v1beta1.Ingress: Get "http://0.0.0.0:8001/apis/networking.k8s.io/v1beta1/ingresses?resourceVersion=2862637": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:30.102592 6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:157: Failed to list *v1.Secret: Get "http://0.0.0.0:8001/api/v1/secrets?resourceVersion=9070770": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:30.102835 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=8m2s&timeoutSeconds=482&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:31.102165 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=6m58s&timeoutSeconds=418&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:31.103086 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m11s&timeoutSeconds=491&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:31.107797 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=8m24s&timeoutSeconds=504&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:31.743855 6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:161: Failed to list *v1.Pod: Get "http://0.0.0.0:8001/api/v1/namespaces/invalid-namespace/pods?resourceVersion=9069652": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:31.947330 6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:157: Failed to list *v1.Secret: Get "http://0.0.0.0:8001/api/v1/secrets?resourceVersion=9070770": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:32.010309 6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:182: Failed to list *v1beta1.Ingress: Get "http://0.0.0.0:8001/apis/networking.k8s.io/v1beta1/ingresses?resourceVersion=2862637": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:32.103217 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=6m54s&timeoutSeconds=414&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:32.104147 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m58s&timeoutSeconds=538&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:32.108573 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=6m49s&timeoutSeconds=409&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:33.104284 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=9m41s&timeoutSeconds=581&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:33.105079 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=9m0s&timeoutSeconds=540&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:33.109347 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=5m55s&timeoutSeconds=355&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:34.104820 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=7m18s&timeoutSeconds=438&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:34.105892 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=5m41s&timeoutSeconds=341&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:34.110270 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=7m29s&timeoutSeconds=449&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:35.105586 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=8m23s&timeoutSeconds=503&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:35.106736 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=7m53s&timeoutSeconds=473&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:35.111164 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=5m5s&timeoutSeconds=305&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:35.646766 6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:161: Failed to list *v1.Pod: Get "http://0.0.0.0:8001/api/v1/namespaces/invalid-namespace/pods?resourceVersion=9069652": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:36.069955 6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:182: Failed to list *v1beta1.Ingress: Get "http://0.0.0.0:8001/apis/networking.k8s.io/v1beta1/ingresses?resourceVersion=2862637": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:36.106781 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=9m10s&timeoutSeconds=550&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:36.107470 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m12s&timeoutSeconds=492&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:36.112088 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=9m20s&timeoutSeconds=560&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:36.431501 6 reflector.go:178] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:157: Failed to list *v1.Secret: Get "http://0.0.0.0:8001/api/v1/secrets?resourceVersion=9070770": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:37.107723 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=7m2s&timeoutSeconds=422&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:37.108695 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=7m59s&timeoutSeconds=479&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:37.112833 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=5m14s&timeoutSeconds=314&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:38.108760 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=6m19s&timeoutSeconds=379&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:38.109533 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m10s&timeoutSeconds=490&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:38.113538 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=9m13s&timeoutSeconds=553&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:38.294058 6 leaderelection.go:320] error retrieving resource lock invalid-namespace/ingress-controller-leader-nginx: Get "http://0.0.0.0:8001/api/v1/namespaces/invalid-namespace/configmaps/ingress-controller-leader-nginx": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:39.109905 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=8m47s&timeoutSeconds=527&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:39.110554 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m25s&timeoutSeconds=505&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:39.114372 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=8m7s&timeoutSeconds=487&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:40.110786 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=5m22s&timeoutSeconds=322&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:40.111850 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m44s&timeoutSeconds=524&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:40.115094 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=6m9s&timeoutSeconds=369&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:41.111831 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=5m14s&timeoutSeconds=314&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:41.112632 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=6m26s&timeoutSeconds=386&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:41.115883 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=9m51s&timeoutSeconds=591&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:42.112808 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:160: Failed to watch *v1.ConfigMap: Get "http://0.0.0.0:8001/api/v1/configmaps?allowWatchBookmarks=true&resourceVersion=9070852&timeout=9m9s&timeoutSeconds=549&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:42.113817 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:159: Failed to watch *v1.Service: Get "http://0.0.0.0:8001/api/v1/services?allowWatchBookmarks=true&resourceVersion=2820793&timeout=8m59s&timeoutSeconds=539&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
E0611 02:56:42.116610 6 reflector.go:382] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:158: Failed to watch *v1.Endpoints: Get "http://0.0.0.0:8001/api/v1/endpoints?allowWatchBookmarks=true&resourceVersion=9070859&timeout=5m11s&timeoutSeconds=311&watch=true": dial tcp 0.0.0.0:8001: connect: connection refused
When the API server is available again, you will not see such an output.
Would it not be better if ingress-nginx was running with the same set of endpoints while API is unavailable?
When the API server is not available, there are no changes in the nginx model. This means if some of the pods running your application dies, the ingress controller will not be aware of such an event and it will try to send traffic to the pod.
Some background, we are undergoing the process of rotating some of the secrets used by the Kube Api server because the client certificates will expire. To acheive this we need to restart the api servers, but when they restart the bearer tokens that were originally given by the service account are invalid.
There is no support in client-go to "reload" the service accounts. You need to restart the pod. Not sure if there is any support for this scenario
When the API server is not available, there are no changes in the nginx model. This means if some of the pods running your application dies, the ingress controller will not be aware of such an event and it will try to send traffic to the pod.
I'm not sure I can understand it.
When API server is not available - ingress-nginx stops sending traffic anywhere because it believes endpoints have died (while they are still available).
When API server is not available - ingress-nginx stops sending traffic anywhere because it believes endpoints have died (while they are still available).
Please post the logs of the ingress controller pod when this happens.
It outputs these lines (one per service, only one shown) when it happens:
W0612 02:25:19.114102 6 controller.go:909] Service "rook-ceph/rook-ceph-mgr-dashboard" does not have any active Endpoint.
Yet, if I try to request this service from the cli:
# curl -v 10.53.185.114:7000
* Rebuilt URL to: 10.53.185.114:7000/
* Trying 10.53.185.114...
* TCP_NODELAY set
* Connected to 10.53.185.114 (10.53.185.114) port 7000 (#0)
> GET / HTTP/1.1
> Host: 10.53.185.114:7000
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Length: 1162
< Content-Language: en-US
< Accept-Ranges: bytes
< Vary: Accept-Language, Accept-Encoding
< Server: CherryPy/3.2.2
< Last-Modified: Mon, 02 Mar 2020 17:55:01 GMT
< Date: Fri, 12 Jun 2020 02:28:47 GMT
< Content-Type: text/html;charset=utf-8
<
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Ceph</title>
It responds successfully
We had to roll our masters and the IPs in our Kubernetes API got stale due to a DNS issue. Ingress nginx couldn't talk to the API and then it cleared out all of the backends.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
/assign @rikatz
/lifecycle active
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/lifecycle frozen
/triage accepted /priority important-longterm
This seems that after splitting the control plane from datplane something that will be doable.
Control Plane will keep an state of the current configuration. Everytime dataplane gets it, the same config will be there.
The con of this is that still we cannot deal with interruptions of kubernetes api server, but now we can think on another approach like
Data Plane -gRPC-> Kubernetes Service -> control plane (1-N) -> Kubernetes API Server (1-N)
/help-wanted
/help
@iamNoah1: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
In response to this:
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Also seeing this across some AKS clusters that seem to sometimes briefly (~ a few minutes) lose connectivity to the control plane resulting in the same Service "zzz" does not have any active Endpoint.
messages.
This manifests as 503 errors being served when hitting an Ingress URL until connection with the control plane is once again resumed and the appropriate endpoints populated again.
This will also happen if you have a service pointing to a deleted deployment. The endpoints are effectively not there and the logs Service "x" does not have any active Endpoint.
starts to appear. Nginx will become very unstable, throwing 503 left and right.
I know we should not have service pointing to deleted deployment but hey, this happened to us :(
We also hit this issue on GKE, although I'm confused what's going on because the Nginx Ingress Controller nginx.conf
generation logic seems to discover all Kubernetes resources through informer caches so I would expect an API server disconnect to just prevent updates (rather than wiping all existing endpoints).
Could the informers be clearing their local cache after a disconnect?