ingress-gce
ingress-gce copied to clipboard
Ingress changes resulting in 502s
I was recently made aware of the following behavior and am wondering if it is intentional:
When you create an Ingress
-resource and one of the rules is changed in a way, that the service it points to is not used by the Ingress
-resource it will result in 502s as the NEG is de-provisioned.
This can be easily reproduced on GKE Autopilot as follows:
apiVersion: apps/v1
kind: Deployment
metadata:
name: whoami
spec:
replicas: 1
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: traefik/whoami
---
apiVersion: v1
kind: Service
metadata:
name: whoami
spec:
ports:
- name: http
targetPort: 80
port: 80
selector:
app: whoami
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
selector:
matchLabels:
app: nginx
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx
spec:
ports:
- name: http
targetPort: 80
port: 80
selector:
app: nginx
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: test
spec:
rules:
- host: foo.mydomain.com
http:
paths:
- backend:
service:
name: nginx
port:
number: 80
path: /
pathType: Prefix
- host: bar.mydomain.com
http:
paths:
- backend:
service:
name: whoami
port:
number: 80
path: /
pathType: Prefix
If we change the backend of foo.mydomain.com
above to whoami
, nginx
will be unused and we will get 502s for about a minute.
Unfortunately it is not possible to use a pre-existing NEGs to work around this. The only workaround is to create a "dummy"-rule to keep the NEG around and remove it later once the changes have propagates and every new request is served by the new backend.
Is this expected behavior? It would be great if the NEG is not immediately removed but kept for a few minutes to avoid 502s.
/kind bug
Hi @trevex,
Thank you for creating the issue! We need a little more information to understand what could be the possible issue. Are you seeing that the NEG is being pre-maturely removed from the BackendService or that the NEG is being deleted before the BackendService is updated with the correct NEGs?
Thanks, Swetha
Hi @swetharepakula,
Basically the latter: The NEG is deleted before the BackendService is updated (or changes have properly propagated to GFE to be active) with the correct NEGs.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@trevex, sorry for the delayed response. It sounds like you are asking for the ability to do a migration from one service to another without any downtime. Unfortunately with how the controllers work, this is not possible. When using Ingress, the NEG lifecycle is tied to the service being present in an Ingress. From the neg controller perspective, if the service is no longer specified as a part of an Ingress, the Neg controller will delete the NEG. This is expected behavior as this situation cannot be differentiated as removing a service from an Ingress. There is then a race between the Ingress controller updating the Backend Service before the Neg Controller deletes it.
If you would like to keep both around during some transition period, the method you mentioned is the best approach where you have a dummy path, so that the NEG controller will not delete the NEG and your ingress will get updated. However you have the risk of having that 3rd path exposed temporarily. Then after confirming that the BackendService is updated as expected, you can remove the service so that the old NEG is deleted.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.