gateway icon indicating copy to clipboard operation
gateway copied to clipboard

feat(EG K8S Provider): implement leader election for EG controller

Open alexwo opened this issue 1 year ago • 1 comments

What type of PR is this?

feat(controller): implement leader election for EG controller

What this PR does / why we need it:

  1. This PR introduces leader election for the Envoy Gateway (EG) controller to ensure that only one instance of the controller can perform write operations at any given time.
  2. Ensure consistent availability of xDS (Envoy's discovery service) across the system.

The implementation follows best practices for distributed systems, enhancing the reliability and stability of the EG controller in multi-instance deployments.

Which issue(s) this PR fixes:

This enhancement facilitates the scaling of xDS services and enables leader election support.

https://github.com/envoyproxy/gateway/issues/1953

!draft changes!

alexwo avatar Feb 25 '24 21:02 alexwo

Codecov Report

Attention: Patch coverage is 66.66667% with 32 lines in your changes are missing coverage. Please review.

Project coverage is 66.46%. Comparing base (ec0f31b) to head (29da944).

Files Patch % Lines
internal/infrastructure/runner/runner.go 0.00% 17 Missing :warning:
internal/provider/kubernetes/kubernetes.go 73.52% 6 Missing and 3 partials :warning:
api/v1alpha1/envoygateway_helpers.go 85.71% 3 Missing :warning:
internal/provider/kubernetes/controller.go 62.50% 2 Missing and 1 partial :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2694   +/-   ##
=======================================
  Coverage   66.45%   66.46%           
=======================================
  Files         160      161    +1     
  Lines       22553    22632   +79     
=======================================
+ Hits        14988    15042   +54     
- Misses       6696     6717   +21     
- Partials      869      873    +4     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 26 '24 02:02 codecov[bot]

Should this PR also include some work in the infrastructure controller?

guydc avatar Feb 29 '24 13:02 guydc

Should this PR also include some work in the infrastructure controller? Hi @guydc , We can attempt to ensure that there is only a single controller that can create/update e.g., infra related resources at a time.

Yes, the side effect at this point is that infra updates will result in concurrent updates and extra revision conflict's / retrys.

alexwo avatar Feb 29 '24 13:02 alexwo

Should this PR also include some work in the infrastructure controller? Hi @guydc , We can attempt to ensure that there is only a single controller that can create/update e.g., infra related resources at a time.

The side effect at this point is that infra updates will result in concurrent updates and extra revision conflict's / retrys.

Added changes to ensure that infra will be created only by the elected EG instance.

alexwo avatar Mar 01 '24 00:03 alexwo

/retest

alexwo avatar Mar 06 '24 15:03 alexwo

/retest

alexwo avatar Mar 27 '24 19:03 alexwo

finally got to this, thanks for building this out ! adding some comments

arkodg avatar Mar 29 '24 12:03 arkodg

finally got to this, thanks for building this out ! adding some comments

Thanks!

alexwo avatar Mar 29 '24 14:03 alexwo

/retest

alexwo avatar Mar 29 '24 19:03 alexwo

/retest

alexwo avatar Apr 02 '24 06:04 alexwo

/retest

alexwo avatar Apr 03 '24 19:04 alexwo

/retest

alexwo avatar Apr 04 '24 07:04 alexwo