apiserver-network-proxy icon indicating copy to clipboard operation
apiserver-network-proxy copied to clipboard

Handle agent disconnects for PendingDial

Open Jefftree opened this issue 5 years ago • 2 comments
trafficstars

When an agent disconnects https://github.com/kubernetes-sigs/apiserver-network-proxy/pull/125 closes all client side connections that use the corresponding agent. However, PendingDial requests may still be in flight and have not been added to the list of clients yet. We should either fail them or retry with a different agent instead of letting the client hit its dial timeout.

Original context from @cheftako:

Most of the time I would expect pending dial to be empty. However if there is something in there, there is a chance its request went out via this backend. If so we will never get the response and that also needs to be dealt with.

The issue is that we do not record in the pending data structure which backend it used, so we cannot tell if anything on the pending list would be effected by a given backend breaking. We also need to work out how to deal with it. One option would be to just fail, which is probably easiest. However as the connection has not yet be established, we should be able to switch to using a different backend.

Jefftree avatar Jul 15 '20 16:07 Jefftree

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Oct 13 '20 17:10 fejta-bot

/lifecycle frozen

Jefftree avatar Oct 13 '20 17:10 Jefftree