Unexpected Pod while deleting gateways with merge gateways enabled
Description:
There's some unexpected Pod during this process for deleting gateways with mergeGateways features on.
Repro steps:
- Apply the manifest from quickstart
- Apply these config below, to create a GC named
mgwith MergeGateways feature on, and 3 GTWs with 3 HTTPRoutes attached to each GTW
kubectl apply -f mg.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: mg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: custom-proxy-config
namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: custom-proxy-config
namespace: envoy-gateway-system
spec:
mergeGateways: true
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: merged-eg-1
namespace: default
spec:
gatewayClassName: mg
listeners:
- allowedRoutes:
namespaces:
from: Same
name: http
port: 8080
protocol: HTTP
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: merged-eg-2
namespace: default
spec:
gatewayClassName: mg
listeners:
- allowedRoutes:
namespaces:
from: Same
name: http
port: 8081
protocol: HTTP
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: merged-eg-3
namespace: default
spec:
gatewayClassName: mg
listeners:
- allowedRoutes:
namespaces:
from: Same
name: http
port: 8082
protocol: HTTP
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: hostname1-route
spec:
parentRefs:
- name: merged-eg-1
hostnames:
- "www.example.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: backend
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /example
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: hostname2-route
spec:
parentRefs:
- name: merged-eg-2
hostnames:
- "www.example2.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: backend
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /example2
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: hostname3-route
spec:
parentRefs:
- name: merged-eg-3
hostnames:
- "www.example3.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: backend
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /example3
- Everything is fine now
k get gc ✭ ✈
NAME CONTROLLER ACCEPTED AGE
envoy-gateway gateway.envoyproxy.io/gatewayclass-controller True 65m
mg gateway.envoyproxy.io/gatewayclass-controller True 2s
k get gtw ✭ ✈
NAME CLASS ADDRESS PROGRAMMED AGE
merged-eg-1 mg 172.18.255.200 True 4s
merged-eg-2 mg 172.18.255.200 True 4s
merged-eg-3 mg 172.18.255.200 True 4s
k get po -n envoy-gateway-system ✭ ✈
NAME READY STATUS RESTARTS AGE
envoy-gateway-6b8cdbfcdc-fbcb8 1/1 Running 0 66m # <- focus on this
envoy-mg-e9949d90-6d8596c878-ff5rb 1/1 Running 0 14s
-
k delete -f mg.yaml -
There're some unexpected Pod showing up
k get po -n envoy-gateway-system ✭ ✈
NAME READY STATUS RESTARTS AGE
envoy-default-merged-eg-2-c7655c02-dd7c5d5d7-wp62r 0/1 Terminating 0 2s # <- unexpected
envoy-default-merged-eg-3-2a08f1cd-85dc9dbc6f-6wm7n 0/1 Terminating 0 2s # <- unexpected
envoy-gateway-6b8cdbfcdc-fbcb8 1/1 Running 0 66m
Environment:
lattest
This problem will not exist if the Gateways and HTTPRoutes got deleted first, then delete the rest GatewayClass and EP.
If the GatewayClass and EP got deleted first, the merge gateways feature fails, so all 3 gateways will be created separately.
So by applying k delete -f mg.yaml, the GatewayClass and EP will be deleted first, causing all 3 gateways to be created separately and then got deleted immediately.
Found the reason that causing this problem, and it does not seems like a bug to me, so closing this one now.
hey @shawnh2 if the GWC gets deleted first, why will the GWs get created again ?
maybe related to https://github.com/envoyproxy/gateway/pull/2659
hey @shawnh2 if the GWC gets deleted first, why will the GWs get created again ?
by answering this question, i toke a closer look at EG reconcile method, and here is what i found.
-
if the GWC got deleted first, it will still appears in
acceptedGCslist, because it won't pass the finializer check: https://github.com/envoyproxy/gateway/blob/cf46fbe776918ad19444e26d637ffcc79676ca23/internal/provider/kubernetes/controller.go#L153-L154 -
then EG will take the deleted GWC name as index to list the GTWs that associated with, we can still get all the GTWs here since they are not be deleted https://github.com/envoyproxy/gateway/blob/cf46fbe776918ad19444e26d637ffcc79676ca23/internal/provider/kubernetes/controller.go#L625-L626 so all the GTWs will be recreated like I described above
-
we cannot fall into this logic to remove the finializer for GWC, so the GWC will never pass the finializer check in step 1, remain accepted https://github.com/envoyproxy/gateway/blob/cf46fbe776918ad19444e26d637ffcc79676ca23/internal/provider/kubernetes/controller.go#L347
I'm not sure what expected behavior of this should be?
IMO, at least all the GTWs should be in NOT ACCEPT status and all the related resources like Service, Deployements etc should not be recreated.
It is also happening when applying resources, I have seen this in the past, is this only happening with merged gateways @shawnh2? This behaviour seems similar to what was fixed before https://github.com/envoyproxy/gateway/pull/2395 Since we changed the watchable interface to a map, which is unordered, wonder it is causing these unnecessary updates.
It is also happening when applying resources, I have seen this in the past, is this only happening with merged gateways @shawnh2?
This behaviour seems similar to what was fixed before https://github.com/envoyproxy/gateway/pull/2395
Since we changed the watchable interface to a map, which is unordered, wonder what is causing these unnecessary updates.
yes, it only happens with merge gateways. it's mainly caused by the step 2 which I described above.
I see, this is matching the deletion issue
This issue has been automatically marked as stale because it has not had activity in the last 30 days.
@shawnh2 can this one be closed ?
I think this behavior is a bug, will send a fix ASAP.
This issue has been automatically marked as stale because it has not had activity in the last 30 days.