grpc-go icon indicating copy to clipboard operation
grpc-go copied to clipboard

xdsclient: flush any unsent loads when the LRS stream is closed

Open easwars opened this issue 1 year ago • 11 comments

The LRS implementation in the xDS client transport can be found here: https://github.com/grpc/grpc-go/blob/55d8783479c7cc1a72a435340432702a67d809aa/xds/internal/xdsclient/transport/loadreport.go#L47

The ReportLoad() API returns a cancel func which is plumbed all the way to the user of the XDSClient API here: https://github.com/grpc/grpc-go/blob/55d8783479c7cc1a72a435340432702a67d809aa/xds/internal/xdsclient/client.go#L56

The current implementation of this cancel function simply cancels the context passed to the LRS stream, which results in the stream getting closed. But what we really want is for unsent loads to be flushed on the stream before it is closed.

The xDS client transport spawns the lrsRunner goroutine https://github.com/grpc/grpc-go/blob/55d8783479c7cc1a72a435340432702a67d809aa/xds/internal/xdsclient/transport/loadreport.go#L95 which (in a backoff loop) sends the first LRS request, receives the first response and then calls sendLoads to periodically send out load reports https://github.com/grpc/grpc-go/blob/55d8783479c7cc1a72a435340432702a67d809aa/xds/internal/xdsclient/transport/loadreport.go#L153

What we want is for the cancel function to synchronize with sendLoads so that when the cancel function is invoked, it results in unsent loads being flushed before the stream context is cancelled.

easwars avatar Mar 09 '23 19:03 easwars

Hi @easwars, I'd like to work on it.

erni27 avatar Apr 22 '23 07:04 erni27

hi @easwars @erni27 is the fix in progress?

Aditya-Sood avatar May 05 '23 16:05 Aditya-Sood

@Aditya-Sood: Yes, it is.

erni27 avatar May 05 '23 18:05 erni27

hi @erni27, do you need help with this?

cc @arvindbr8

Aditya-Sood avatar Aug 20 '23 16:08 Aditya-Sood

@Aditya-Sood: I've been extremely busy recently. This is still on my agenda but I'm not sure when exactly I'll be able to get back to it (probably within a month). If you have more capacity right now then feel free to work on it.

I took a look at it in May and it doesn't seem so hard. I was able to reproduce broken behaviour by playing with the following test https://github.com/grpc/grpc-go/blob/master/xds/internal/xdsclient/transport/loadreport_test.go#L38. The main problem is choosing the appropriate way to synchronise the stream cancellation and flushing the loads.

erni27 avatar Aug 21 '23 21:08 erni27

@erni27 : Thanks for the update.

@Aditya-Sood : Let me know what you think.

easwars avatar Aug 21 '23 21:08 easwars

thanks @erni27, I'll get started on it

Aditya-Sood avatar Aug 23 '23 04:08 Aditya-Sood

@Aditya-Sood -- Just to check if assignment is right. We have Hacktoberfest right around the corner. Please be sure to register before sending your contributions for review.

arvindbr8 avatar Sep 27 '23 20:09 arvindbr8

Yes @arvindbr8, I was working on #6288 so hadn't picked this up yet

Thanks for the reminder, I've registered now :)

Aditya-Sood avatar Sep 28 '23 05:09 Aditya-Sood

hi @zasweq, @easwars could you please clarify this query https://github.com/grpc/grpc-go/pull/6723#issuecomment-1801171237?

Aditya-Sood avatar Nov 27 '23 02:11 Aditya-Sood

Hi @Aditya-Sood I just returned from leave and have a bunch of things on my plate which are higher priority than this one. So, please expect a bit more delay on this one. Thanks.

easwars avatar Nov 30 '23 22:11 easwars