apiserver-network-proxy
apiserver-network-proxy copied to clipboard
Handle agent disconnects for PendingDial
When an agent disconnects https://github.com/kubernetes-sigs/apiserver-network-proxy/pull/125 closes all client side connections that use the corresponding agent. However, PendingDial requests may still be in flight and have not been added to the list of clients yet. We should either fail them or retry with a different agent instead of letting the client hit its dial timeout.
Original context from @cheftako:
Most of the time I would expect pending dial to be empty. However if there is something in there, there is a chance its request went out via this backend. If so we will never get the response and that also needs to be dealt with.
The issue is that we do not record in the pending data structure which backend it used, so we cannot tell if anything on the pending list would be effected by a given backend breaking. We also need to work out how to deal with it. One option would be to just fail, which is probably easiest. However as the connection has not yet be established, we should be able to switch to using a different backend.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/lifecycle frozen