cilium icon indicating copy to clipboard operation
cilium copied to clipboard

Add an experimental xDS client

Open AwesomePatrol opened this issue 1 year ago • 5 comments

Part of https://github.com/cilium/design-cfps/pull/14

Added an experimental xDS client library

AwesomePatrol avatar Aug 21 '24 08:08 AwesomePatrol

Please remove internal ticket numbers from commit descriptions. I also don't expect Add a simple e2e test to work in OSS, so this needs to be reworked or skipped.

DamianSawicki avatar Aug 21 '24 11:08 DamianSawicki

Please remove internal ticket numbers from commit descriptions.

I will keep them. They make backports and tracking all relevant changes much easier.

I also don't expect Add a simple e2e test to work in OSS, so this needs to be reworked or skipped.

This works and is useful. There are other tests which rely on a particular environment. We don't require any internal behavior so anyone can verify these changes using this test. I removed it for now (as it also pulls new stuff into vendor), but I hope to add it later (maybe in pkg/gke, pkg/google path).

AwesomePatrol avatar Aug 21 '24 11:08 AwesomePatrol

CC: @bowei

bowei avatar Aug 21 '24 15:08 bowei

Commit adbaca107ae4de97fb1e6cd2a0768f7b4e7a3768 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

/assign @bowei

bowei avatar Sep 09 '24 16:09 bowei

@sayboras

Seems like there is some discussion pending based on the label, so just comment instead of approving :)

I also noticed the label, but I am out of the loop/not aware of any discussion. Can someone link it to me?

I appreciate the comment even if it is not an approval yet :)

@jrajahalme

It looks like there are no NACKs for SOTW protocol

Good catch! I added NACKs in SoTW now. I didn't do it in the first place as I was still considering dropping SoTW support completely.

There is a hardcoded list of TypeURLs that was not updated on the commit that added route config support?

Another great catch! I wanted to force an order of processing of different resource types, but in practice it doesn't make sense (there is only one type per message). I changed the code to just go over all resources it received.

@youngnick

I do think that making this code a little bit more generic would be good before moving ahead

I moved non-generic (GCP-specific) parts to a separate package. If this is insufficient, please outline the possible steps to make it more generic

@robscott

somewhere in your PR description it would be useful to reference the only other xDS client written in go

Done. It is unfortunate that we can't use it.

@bowei

If the code is really the same, I suggest abstracting the delta vs state of the world using an interface

That would be great, but I believe that it will make the code much harder to read. We can explore this possibility later. Making the underlying implementation slightly more generic will not change its behavior in any way.

all

Thanks for all the comments, from the many (6?) reviewers. It seems that no issues were found with the code, so in order to avoid back-and-forth, please leave LGTM (comment) if you have only minor/nitpicky comments. I must resolve them before merging anyways.

Thanks again and I appreciate all your inputs. I love to see this PR gaining traction :star_struck:

AwesomePatrol avatar Sep 12 '24 06:09 AwesomePatrol

I added the don't merge/discussion label because I thought there might be some extra discussion coming. Any relevant discussion is here, on this PR.

I'm just finishing up for the day now, I will take a look tomorrow morning my time.

youngnick avatar Sep 12 '24 07:09 youngnick

Thank you, @ovidiutirla, for the most thorough review so far. I kept some comments open so others could chime in.

all

Please feel free to review it now. I won't make any updates until tomorrow morning (or even Monday).

AwesomePatrol avatar Sep 12 '24 12:09 AwesomePatrol

/cc @kl52752

kl52752 avatar Sep 12 '24 14:09 kl52752

Commit 1f9cd65ac3a142b482c1684e01031d62d3305e2f does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

Commit 1f9cd65ac3a142b482c1684e01031d62d3305e2f does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

Reviews quieted down a bit, so I started an experiment with generics (to abstract away sotw/delta differences). Code is available here. Some parts are simpler, but there are also some if sotw left. Please let me know what approach do you prefer

AwesomePatrol avatar Sep 16 '24 14:09 AwesomePatrol

It's worth considering that there isn't that much that is shared as the bodies of a lot of the functions are completely enclosed by a if c.opts.UseSOTW.

I can see the two flows diverging further given the differences between the protocols (delta vs sotw). I think long-term it's not going to be a win to keep the two glued together.

It is also confusing that only one of {delta, stream} variables will be non-nil for many of the funcs.

I think you can simplify and avoid the generics by doing the following:

type getter interface {
  sendInitialDiscoverRequest() error
  startFetchResponse() chan error
  startLoop() chan error
}

type sotwStream struct {...}
// Implementation of sotw stream

type deltaStream struct {...}
// implementation of delta stream

if you want to keep process() the same:

func (c *XDSClient) process(ctx context.Context, g getter) error {
	if err := g.sendInitialDiscoveryRequests(); err != nil {
		return fmt.Errorf("start request routine: %w", err)
	}

        errRespCh := g.startFetchResponse()
        errLoopCh := g.startLoop()

	for {
		select {
		case <-ctx.Done():
			return ctx.Err()
		case err, ok := <-errRespCh: ...
		case err, ok := <-errLoopCh: ...
		}
	}
}

My recommendation is to split this into different files for easier comprehension:

sotw_stream.go : contains sotw stream delta_stream.go : contains delta stream code client.go : contains client + common routines

bowei avatar Sep 18 '24 05:09 bowei

Can you also check that instances where <-ctx.Done() is checked is not redundant?

Example:

// From fetchResponses()
	select {
		case <-ctx.Done(): /// <<< is this redundant?
			return
		default:
		}
		if c.opts.UseSOTW { ...

bowei avatar Sep 18 '24 05:09 bowei

Can you also check that instances where <-ctx.Done() is checked is not redundant?

Yes it is redundant. I will clean up checks around Send/Recv. I will also add docstrings to better outline running goroutines and their interactions.

I think long-term it's not going to be a win to keep the two glued together.

I agree. I will try to eliminate all if sotw parts.

you can simplify and avoid the generics

I will try this approach, but I think at this point it may be better to just go with generics (please tell me if the downsides outweigh the benefits).

Thanks for your comments. They helped me a lot and I believe the code will look much better thanks to your inputs :rocket:

AwesomePatrol avatar Sep 18 '24 06:09 AwesomePatrol

I don't think the complexity of generics is justified in this case...

bowei avatar Sep 18 '24 06:09 bowei

I don't think the complexity of generics is justified in this case...

Unfortunately not doing it with generics made it quite difficult to share the business logic.

With generics I managed to eliminate all instances of if sotw from the code. I believe that behavior specific to sotw/delta is now clearly separated in client_sotw.go and client_delta.go files. Both implement a common generic interface.

Thanks for the review. The code is in a much better state now and I believe it will a breeze to maintain and extend when compared to the initial draft

AwesomePatrol avatar Sep 18 '24 14:09 AwesomePatrol

All conversations are resolved! :tada:

I won't update this PR for at least another 18 hours, so this a great time to review :rocket:

AwesomePatrol avatar Sep 19 '24 11:09 AwesomePatrol

hi All, Please hold on on further reviewing this PR because I am working on small refactoring of the interface. I think it will be ready within 2 weeks.

kl52752 avatar Oct 31 '24 17:10 kl52752

/test

DamianSawicki avatar Nov 19 '24 15:11 DamianSawicki

The PR was updated and now is ready for review.

kl52752 avatar Nov 22 '24 11:11 kl52752

Commit b1be51e7fec6c6210e0a20b52e7253cdc95f8685 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

Commit ccc91360e14b20fddeaed8bbfee0e5c656095e47 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

Discussed offline -- I think we are ok to go ahead with this for now -- it is experimental and can always be improved.

/lgtm /approve

bowei avatar Jan 22 '25 22:01 bowei

Agreed that this looks fine for experimental to me now.

@youngnick Thank you for taking another pass on the PR and for your approval. Can you also delete needs-discussion label?

Who should I asked for approval to merge this review?

kl52752 avatar Jan 28 '25 10:01 kl52752

Who should I asked for approval to merge this review?

Judging from

@AwesomePatrol AwesomePatrol requested review from jrajahalme (assigned from cilium/envoy) and nebril (assigned from cilium/contributing) 5 months ago

Code owner review required Waiting on code owner review from cilium/envoy.

and https://github.com/cilium/community/blob/main/ladder/teams/envoy.yaml, you're only missing an approval from @jrajahalme. EDIT: Indeed, sayboras is another option.

DamianSawicki avatar Jan 28 '25 10:01 DamianSawicki

@kl52752 I believe you still need an approval from the @cilium/envoy team. I re-requested a review from Tam who is in that team and apparently already had a look.

pchaigno avatar Jan 28 '25 11:01 pchaigno

/test

pchaigno avatar Jan 28 '25 11:01 pchaigno