gateway icon indicating copy to clipboard operation
gateway copied to clipboard

Handle Backend Pod Upgrades

Open arkodg opened this issue 1 year ago • 6 comments

If 2 backend pods are undergoing a rolling restart, add outlier detection and retry settings in Envoy proxy to ensure no traffic intended for the backend is dropped

arkodg avatar Aug 02 '23 05:08 arkodg

Since we have clusterIP as backend to the envoy gateway, Isn't it k8s service responsibility to do this. What am I missing here?

tanujd11 avatar Aug 02 '23 10:08 tanujd11

@tanujd11 you're right now, at this point, this is moot since we have only have ClusterIP endpoint, but once EndpointSlice support (https://github.com/envoyproxy/gateway/pull/1494) lands, we'll have to handle the case where the control plane / EG is not fast enough to propagate current Ready endpoints to Envoy Proxy so we'll need to add some mechanism in Envoy to deal with this eventual consistency such as outlier detection and trying out another endpoint in the xds cluster

arkodg avatar Aug 02 '23 16:08 arkodg

Hi @arkodg I can pick this up. Since we have EndpointSlice support enabled, What about having outlierDetection API in backendTrafficPolicy API?

tanujd11 avatar Nov 02 '23 15:11 tanujd11

awesome thanks @tanujd11 . I would break this up into 3 parts

  • e2e to make sure pod rolling restart is hitless
  • outlier detection API (lets create an issue if one doesn't exist)
  • whether to enable it or not (by default)

arkodg avatar Nov 02 '23 16:11 arkodg

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Dec 02 '23 20:12 github-actions[bot]

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Jan 06 '24 20:01 github-actions[bot]