go-control-plane
go-control-plane copied to clipboard
[#583][Sotw] Ensure resources are properly sent again if envoy unsubscribes then subscribes again to a resource
This PR is part of the fix for the issue related to resources being unsubscribed from then subscribed to again This one applies to both simple and linear cache when using the sotw protocol. Actionable cases would likely only impact sotw-ADS as EDS outside of ADS does not share streams and will therefore usually not have multiple resources this way
This PR is built on top of https://github.com/envoyproxy/go-control-plane/pull/584 as it's relying on the client state being provided to the cache
It is also fixing a few bugs found during testing:
- in linear cache, the request set in the response was not the actual request. It was breaking some sotw parts as this was actually relied upon for some behavior (though not the bug of the issue)
- in linear cache, when using sotw, if a request has multiple resources (at least three) and receive at least two subsequent updates of subsets of those resources, there is a deadlock as the watch is leaked in the watches of resources requested but not yet updated. As this channel won't be listened after the first reply, the second update will fill the buffer and any subsequent cache update impacting a resource requested and not yet updated will deadlock
It is updating a few other aspects in order to implement the fix:
- Adds a staged resources map within the streamState. In sotw it is storing resources returned but not yet ACKed. This behavior should be ported to delta, but is out of the scope here. It allows removing the bespoke lastResponse value set only in sotw and parallel to streamState
- Adds a new returned attribute in the rawResponse, providing the list of resources included in the opaque response. This is needed to know which resources are actually known to the client. The previous implementation based on the request was actually opening edge cases as there is no guarantee a requested resource has ever been replied
- Adds cache logging in linear_test.go (helping while writing tests)
Finally, reasoning on the fixing through proper tracking of resources rather than basic resend of everything on doubts is explained within #584, and is even more applicable in the context of this issue (EDS through sotw-ads) when a single client can track thousands of endpoints within hundreds of CLAs
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!
@valerian-roche Is this one still needed in upstream or should it remain stale/closed?
@valerian-roche Is this one still needed in upstream or should it remain stale/closed?
Those PRs are addressing issues which are still present. There has been no feedback on it or https://github.com/envoyproxy/go-control-plane/pull/584 which is a pre-requisite of those PRs Maybe @alecholmez can provide some feedback on when it could be reviewed
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!
Thanks for working on this, this is great.
This would fix https://github.com/envoyproxy/go-control-plane/issues/431 which makes go-control-plane pretty much incompatible with grpc clients, as those typically unsubscribe to resources after a channel reaches its idle timeout.
Thanks for working on this, this is great.
This would fix #431 which makes go-control-plane pretty much incompatible with grpc clients, as those typically unsubscribe to resources after a channel reaches its idle timeout.
@atollena the patch here fixes the go client idle issue for me. Hopefully that it can help you too