DIAL_CLS / DIAL_RSP race leading to connection leak
There could be a race condition where a DIAL_CLS packet from the frontend is received at the same time as a DIAL_RSP from the backend that could lead to the backend connection being leaked:
This could happen if the following conditions happen in this order:
- DIAL_RSP received from the backend
- The pending dial is still present in https://github.com/kubernetes-sigs/apiserver-network-proxy/blob/b5e5436b2fbeaa03657ff2381cef0d46f18ce267/pkg/server/server.go#L755
- Frontend starts shutting down, sends a DIAL_CLS (prior to https://github.com/kubernetes-sigs/apiserver-network-proxy/pull/398 it wouldn't even send a close request)
- Server sends the dial response the frontend - The FE gRPC stream is still open so the packet is received, but the frontend doesn't process it: https://github.com/kubernetes-sigs/apiserver-network-proxy/blob/b5e5436b2fbeaa03657ff2381cef0d46f18ce267/pkg/server/server.go#L767
- At this point, the server thinks the connection is established, but the frontend is not aware of that, and in the process of shutting down, leading to a leaked backend connection.
This seems fairly unlikely (at least once https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/403 is fixed), but worth tracking.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/lifecycle frozen
/assign @jkh52
/unassign @jkh52 /assign @azimjohn
@tallclair: GitHub didn't allow me to assign the following users: azimjohn.
Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/unassign @jkh52 /assign @azimjohn
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.