gateway icon indicating copy to clipboard operation
gateway copied to clipboard

Unexpected Pod while deleting gateways with merge gateways enabled

Open shawnh2 opened this issue 1 year ago • 14 comments

Description:

There's some unexpected Pod during this process for deleting gateways with mergeGateways features on.

Repro steps:

  1. Apply the manifest from quickstart
  2. Apply these config below, to create a GC named mg with MergeGateways feature on, and 3 GTWs with 3 HTTPRoutes attached to each GTW
kubectl apply -f mg.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: mg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: custom-proxy-config
    namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: custom-proxy-config
  namespace: envoy-gateway-system
spec:
  mergeGateways: true
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: merged-eg-1
  namespace: default
spec:
  gatewayClassName: mg
  listeners:
    - allowedRoutes:
        namespaces:
          from: Same
      name: http
      port: 8080
      protocol: HTTP
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: merged-eg-2
  namespace: default
spec:
  gatewayClassName: mg
  listeners:
    - allowedRoutes:
        namespaces:
          from: Same
      name: http
      port: 8081
      protocol: HTTP
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: merged-eg-3
  namespace: default
spec:
  gatewayClassName: mg
  listeners:
    - allowedRoutes:
        namespaces:
          from: Same
      name: http
      port: 8082
      protocol: HTTP
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: hostname1-route
spec:
  parentRefs:
    - name: merged-eg-1
  hostnames:
    - "www.example.com"
  rules:
    - backendRefs:
        - group: ""
          kind: Service
          name: backend
          port: 3000
          weight: 1
      matches:
        - path:
            type: PathPrefix
            value: /example
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: hostname2-route
spec:
  parentRefs:
    - name: merged-eg-2
  hostnames:
    - "www.example2.com"
  rules:
    - backendRefs:
        - group: ""
          kind: Service
          name: backend
          port: 3000
          weight: 1
      matches:
        - path:
            type: PathPrefix
            value: /example2
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: hostname3-route
spec:
  parentRefs:
    - name: merged-eg-3
  hostnames:
    - "www.example3.com"
  rules:
    - backendRefs:
        - group: ""
          kind: Service
          name: backend
          port: 3000
          weight: 1
      matches:
        - path:
            type: PathPrefix
            value: /example3

  1. Everything is fine now
k get gc                                                             ✭ ✈
NAME            CONTROLLER                                      ACCEPTED   AGE
envoy-gateway   gateway.envoyproxy.io/gatewayclass-controller   True       65m
mg              gateway.envoyproxy.io/gatewayclass-controller   True       2s

k get gtw                                                            ✭ ✈
NAME          CLASS   ADDRESS          PROGRAMMED   AGE
merged-eg-1   mg      172.18.255.200   True         4s
merged-eg-2   mg      172.18.255.200   True         4s
merged-eg-3   mg      172.18.255.200   True         4s

k get po -n envoy-gateway-system                                      ✭ ✈
NAME                                 READY   STATUS    RESTARTS   AGE
envoy-gateway-6b8cdbfcdc-fbcb8       1/1     Running   0          66m  # <- focus on this
envoy-mg-e9949d90-6d8596c878-ff5rb   1/1     Running   0          14s
  1. k delete -f mg.yaml

  2. There're some unexpected Pod showing up

k get po -n envoy-gateway-system                                      ✭ ✈
NAME                                                  READY   STATUS        RESTARTS   AGE
envoy-default-merged-eg-2-c7655c02-dd7c5d5d7-wp62r    0/1     Terminating   0          2s  # <- unexpected
envoy-default-merged-eg-3-2a08f1cd-85dc9dbc6f-6wm7n   0/1     Terminating   0          2s  # <- unexpected
envoy-gateway-6b8cdbfcdc-fbcb8                        1/1     Running       0          66m

Environment:

lattest

shawnh2 avatar Feb 17 '24 12:02 shawnh2

This problem will not exist if the Gateways and HTTPRoutes got deleted first, then delete the rest GatewayClass and EP.

If the GatewayClass and EP got deleted first, the merge gateways feature fails, so all 3 gateways will be created separately.

So by applying k delete -f mg.yaml, the GatewayClass and EP will be deleted first, causing all 3 gateways to be created separately and then got deleted immediately.

shawnh2 avatar Feb 18 '24 08:02 shawnh2

Found the reason that causing this problem, and it does not seems like a bug to me, so closing this one now.

shawnh2 avatar Feb 19 '24 03:02 shawnh2

hey @shawnh2 if the GWC gets deleted first, why will the GWs get created again ?

arkodg avatar Feb 19 '24 21:02 arkodg

maybe related to https://github.com/envoyproxy/gateway/pull/2659

arkodg avatar Feb 20 '24 02:02 arkodg

hey @shawnh2 if the GWC gets deleted first, why will the GWs get created again ?

by answering this question, i toke a closer look at EG reconcile method, and here is what i found.

  1. if the GWC got deleted first, it will still appears in acceptedGCs list, because it won't pass the finializer check: https://github.com/envoyproxy/gateway/blob/cf46fbe776918ad19444e26d637ffcc79676ca23/internal/provider/kubernetes/controller.go#L153-L154

  2. then EG will take the deleted GWC name as index to list the GTWs that associated with, we can still get all the GTWs here since they are not be deleted https://github.com/envoyproxy/gateway/blob/cf46fbe776918ad19444e26d637ffcc79676ca23/internal/provider/kubernetes/controller.go#L625-L626 so all the GTWs will be recreated like I described above

  3. we cannot fall into this logic to remove the finializer for GWC, so the GWC will never pass the finializer check in step 1, remain accepted https://github.com/envoyproxy/gateway/blob/cf46fbe776918ad19444e26d637ffcc79676ca23/internal/provider/kubernetes/controller.go#L347

shawnh2 avatar Feb 20 '24 09:02 shawnh2

I'm not sure what expected behavior of this should be?

IMO, at least all the GTWs should be in NOT ACCEPT status and all the related resources like Service, Deployements etc should not be recreated.

shawnh2 avatar Feb 20 '24 09:02 shawnh2

It is also happening when applying resources, I have seen this in the past, is this only happening with merged gateways @shawnh2? This behaviour seems similar to what was fixed before https://github.com/envoyproxy/gateway/pull/2395 Since we changed the watchable interface to a map, which is unordered, wonder it is causing these unnecessary updates.

cnvergence avatar Feb 20 '24 12:02 cnvergence

It is also happening when applying resources, I have seen this in the past, is this only happening with merged gateways @shawnh2?

This behaviour seems similar to what was fixed before https://github.com/envoyproxy/gateway/pull/2395

Since we changed the watchable interface to a map, which is unordered, wonder what is causing these unnecessary updates.

yes, it only happens with merge gateways. it's mainly caused by the step 2 which I described above.

shawnh2 avatar Feb 20 '24 12:02 shawnh2

I see, this is matching the deletion issue

cnvergence avatar Feb 20 '24 14:02 cnvergence

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Apr 27 '24 08:04 github-actions[bot]

@shawnh2 can this one be closed ?

arkodg avatar May 23 '24 00:05 arkodg

I think this behavior is a bug, will send a fix ASAP.

shawnh2 avatar May 23 '24 00:05 shawnh2

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Jun 22 '24 04:06 github-actions[bot]