apiserver-network-proxy DIAL_CLS / DIAL_RSP race leading to connection leak

There could be a race condition where a DIAL_CLS packet from the frontend is received at the same time as a DIAL_RSP from the backend that could lead to the backend connection being leaked:

This could happen if the following conditions happen in this order:

DIAL_RSP received from the backend
The pending dial is still present in https://github.com/kubernetes-sigs/apiserver-network-proxy/blob/b5e5436b2fbeaa03657ff2381cef0d46f18ce267/pkg/server/server.go#L755
Frontend starts shutting down, sends a DIAL_CLS (prior to https://github.com/kubernetes-sigs/apiserver-network-proxy/pull/398 it wouldn't even send a close request)
Server sends the dial response the frontend - The FE gRPC stream is still open so the packet is received, but the frontend doesn't process it: https://github.com/kubernetes-sigs/apiserver-network-proxy/blob/b5e5436b2fbeaa03657ff2381cef0d46f18ce267/pkg/server/server.go#L767
At this point, the server thinks the connection is established, but the frontend is not aware of that, and in the process of shutting down, leading to a leaked backend connection.

This seems fairly unlikely (at least once https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/403 is fixed), but worth tracking.

Sep 16 '22 21:09 tallclair

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Dec 15 '22 22:12 k8s-triage-robot

/lifecycle frozen

Dec 16 '22 00:12 tallclair

/assign @jkh52

Feb 14 '23 19:02 jkh52

/unassign @jkh52 /assign @azimjohn

May 16 '24 16:05 tallclair

@tallclair: GitHub didn't allow me to assign the following users: azimjohn.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to this:

/unassign @jkh52 /assign @azimjohn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

May 16 '24 16:05 k8s-ci-robot