contour icon indicating copy to clipboard operation
contour copied to clipboard

Replace GRPC server with envoy's go-control-plane

Open jpeach opened this issue 5 years ago • 16 comments

I looked ad go-control-plane a bit and we ought to be able to use it to replace our custom xDS code. The interfaces are a bit different, but we should be able to bind the DAG in without a lot of trouble. This likely gives us ADS support for free.

Related #1286

jpeach avatar Jan 22 '20 03:01 jpeach

Note that this doesn't solve configuration snapshot consistency issues (see various issues in the go-control-plane repo).

jpeach avatar Jan 22 '20 03:01 jpeach

@stevesloka what do we have left to do here? I guess we need to let the go-control-plane impl bake for awhile, and maybe do some perf/load testing of it before sunsetting the contour impl?

skriss avatar Jan 07 '21 22:01 skriss

Yup all of that. We should also update the feature tests to use this along with the integration tests. There was some work on versioning separate caches here that needs finished up: https://github.com/projectcontour/contour/pull/2917

stevesloka avatar Jan 07 '21 22:01 stevesloka

Action items:

  • Use in feature tests
  • Use knative tests to load-test go-control-plane xDS server

Not blocking for this issue, but worth doing:

  • Turn on ADS mode?
  • Investigate if go-control-plane already has support for incremental xDS (https://github.com/envoyproxy/go-control-plane/pull/387)

sunjayBhatia avatar Feb 04 '21 00:02 sunjayBhatia

Currently using the envoy xDS server in a daily build in CI to get confidence before we switch over, at least for SOTW mode

sunjayBhatia avatar Jan 07 '22 16:01 sunjayBhatia

still been flaky in CI e2e tests, trying to run tests locally to discover sources, particularly often Envoys are slower to become healthy

sunjayBhatia avatar Jan 28 '22 20:01 sunjayBhatia

Adding to 1.26.0 milestone to make some forward progress on resolving issues.

skriss avatar May 15 '23 18:05 skriss

Bumping to 1.27

skriss avatar Aug 16 '23 15:08 skriss

@davinci26 @clayton-gonsalves @izturn do any of you have non-production environments where you could try switching to using the go-control-plane xDS server instead of the legacy Contour impl and see if you encounter any problems? We've been running E2E's daily with it enabled with success but some more real-world testing (ideally looking at performance/scale in addition to correctness) would be great too before we consider flipping the default in Contour.

Specifically, this involves setting the following in the Contour config file:

server:
  xds-server-type: envoy

For reference here is a PR that changes the default to be envoy: https://github.com/projectcontour/contour/pull/6146

skriss avatar Jan 30 '24 17:01 skriss

@davinci26 @clayton-gonsalves @izturn (or anyone else) just a gentle nudge here, is this change something you could test in a non-prod environment?

skriss avatar Feb 13 '24 15:02 skriss

@skriss sorry had this message on draft.

We are working on a bunch of items to improve the operational stability of Contour so we are not taking many upstream changes but I think we should be able to take it and test it out in a couple of weeks from now.

Does this work?

davinci26 avatar Feb 13 '24 15:02 davinci26

@skriss sorry had this message on draft.

We are working on a bunch of items to improve the operational stability of Contour so we are not taking many upstream changes but I think we should be able to take it and test it out in a couple of weeks from now.

Does this work?

That'd be great, thanks! We may make the change upstream soon-ish anyway to let CI start running regularly on it. It has already been running in our nightly tests and seems pretty stable.

skriss avatar Feb 13 '24 15:02 skriss

selfnote: consider effects of Endpoint updates

skriss avatar Feb 13 '24 15:02 skriss

@skriss, we have some non-prod environments, but we don't put a lot of payloads on them, we will try it later

izturn avatar Feb 22 '24 01:02 izturn

@skriss Based on our limited testing, everything is fine

izturn avatar Mar 01 '24 02:03 izturn

Remaining work here is to fully remove the Contour xDS server option and implementation, can plan to do this for the 1.31 release assuming no major issues post-1.29. release.

skriss avatar May 06 '24 16:05 skriss