apiserver-network-proxy Protocol: simplify identifers

Protocol: simplify identifers

Open jkh52 opened this issue 1 year ago • 10 comments

Proposal: have the client generate a globally unique identifier, and use this for the lifetime of the proxy connection.

Summary of the connection dial and cleanup flows we have today:

client sends short-lived dialID (a.k.a. "random")
server passes dialID through to agent
agent generates connectionID as part of dial flow
agent mainly keys by connectionID
- state while pending is not well modeled
server keeps two data structures
- PendingDials (by dialID)
- frontends (by agentID+connectionID) after dial success

Problems: Since proxy connections may be closed/cancelled by either end, it is difficult to reliably clean up in all cases. In particular:

https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/403 is difficult to fix properly because PendingDial and frontends (established ProxyClientConnection) are not tracked by any identifier associated with a given serveRecvFrontend + readFrontendToChannel pair of goroutines.
Even if we fix #403, #404 fundamental race remains (client sends DIAL_CLS to cancel a pending dial, but server can already be in pending succeeded state. When this happens the mapping from dialID has been lost, and cleanup is difficult. (I could imagine keeping a "last 30 seconds of dialID + connectionID" association, but it feels very hacky and there are other problems).

End goal is much simpler:

CLOSE_REQ and DIAL_CLS semantics converge
server can avoid the above state transition
agent can more easily support client cancel

Migration

The hardest part seems to be a backward compatible migration (and eventual code deletion + cleanup). Care must also be taken to make sure HTTPConnect (tunnel.go) is given parity treatment.

I think the migration is tractable, and would like to hear what others think.

Feb 14 '23 22:02 jkh52

I agree with this direction. A lot of memory leaks we've seen have to do with the extra state transition between pending dial & connected. Using a consistent ID across the connection lifetime would resolve the race conditions @jkh52 mentioned, and also let us (eventually) clean up a lot of code.

I think the migration is actually pretty straightforward. We just need to make sure all 3 components (client, server, agent) can understand both the old ("random" + connectionID) and the new (UID) IDs, and translate between them. The requests would continue to carry both sets of IDs, until we were confident that no version skew exists (at which point we can clean up handling the old IDs). I do think we should have a version-skew test framework in place before implementing such a change.

Risks:

Security: 128-bit UUIDs should be sufficient to all but guarantee that no accidental collisions happen. From a security perspective, the only concern is if a malicious agent was able to reuse a connection UUID intercept connection traffic. For this reason, the server should validate that requests with a given UUID are coming from the expected frontend / agent.

Entropy exhaustion: with the above security considerations, I don't think we need to use a secure random generator, which mitigates this concern. As a minor optimization, we could use the lower 64 bits as the dial ID.

Feb 15 '23 00:02 tallclair

Proposal: have the client generate a globally unique identifier, and use this for the lifetime of the proxy connection.

Summary of the connection dial and cleanup flows we have today:

* client sends short-lived dialID (a.k.a. "random")

* server passes dialID through to agent

* agent generates connectionID as part of dial flow

* agent mainly keys by connectionID
  
  * state while pending is not well modeled

* server keeps two data structures
  
  * `PendingDials` (by dialID)
  * `frontends` (by agentID+connectionID) after dial success

Problems: Since proxy connections may be closed/cancelled by either end, it is difficult to reliably clean up in all cases. In particular:

* [Frontend closing leaks connection #403](https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/403) is difficult to fix properly because PendingDial and frontends (established ProxyClientConnection) are not tracked by any identifier associated with a given serveRecvFrontend + readFrontendToChannel pair of goroutines.

* Even if we fix [Frontend closing leaks connection #403](https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/403), [DIAL_CLS / DIAL_RSP race leading to connection leak #404](https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/404) fundamental race remains (client sends DIAL_CLS to cancel a pending dial, but server can already be in pending succeeded state. When this happens the mapping from dialID has been lost, and cleanup is difficult. (I could imagine keeping a "last 30 seconds of dialID + connectionID" association, but it feels very hacky and there are other problems).

End goal is much simpler:

* CLOSE_REQ and DIAL_CLS semantics converge

* server can avoid the above state transition

* agent can more easily support client cancel

Migration

The hardest part seems to be a backward compatible migration (and eventual code deletion + cleanup). Care must also be taken to make sure HTTPConnect (tunnel.go) is given parity treatment.

I think the migration is tractable, and would like to hear what others think.

We (AKS) use http-connect mode. Would be delighted to help.

Feb 16 '23 14:02 ipochi

We (AKS) use http-connect mode. Would be delighted to help.

Glad to hear it! Does the idea make sense and seem worth pursuing?

Feb 16 '23 18:02 jkh52

A rough migration plan idea:

client continues to emit random (dialID) and begins emitting new DIAL_REQ field tunnelUID (globally unique, name is up for debate)
- only grpc mode uses client; for http-connect mode, this is done in proxy-server tunnel.go
proxy-agent begins supporting new key
- dual mode: all relevant messages extended to support both old keys (random and connectID) and new key
proxy-agent begins advertising support for new key
- maybe via Connect RPC new field, or via a header, for example "SupportsTunnelUID"
proxy-server can make a decision per tunnel / dial whether to use new flow or old flow
- old flow continues using PendingDials + frontends, new flow uses simpler lookup table
- when new key is present in a message, it takes precedence over old keys

Feb 16 '23 18:02 jkh52

I'm probably getting ahead of myself, but we might want to add the new TunnelUID field to the top-level Packet message, since every packet type should have it. That would let us extract some common logic before switching on packet type (although it will still need an exception for DIAL_REQ).

proxy-server can make a decision per tunnel / dial whether to use new flow or old flow

It will be easier to tell with a prototype, but I'm not convinced it's necessary to maintain two separate flows. I think the proxy-server should have look-up tables to translate between the ID types, so it can lookup the dial/connection ID from tunnelUID, and the tunnelUID from dial/connectionID. Then, the server just fills in whatever IDs are missing, and then executes common logic. In practice, I think this might mean indexing the tunnel by tunnelUID, and just looking up the tunnelUID if it's missing.

Feb 16 '23 19:02 tallclair

I think the proxy-server should have look-up tables to translate between the ID types, so it can lookup the dial/connection ID from tunnelUID,

A server should be able to tell an agent to cancel a pending dial (this feature is missing today). The new key would support this pretty easily. But if server needs to map new key to old key, then it cannot tell the agent to cancel (since it doesn't know connID yet).

Feb 16 '23 20:02 jkh52

Capturing an idea from @tallclair via chat: this proposal could reduce server responsibility down to message pass-through (minimally: keeping track of frontend and backend associations). As a thought experiment, if there is single frontend and single backend it is even stateless.

The idea might be applicable in unit test.

Feb 16 '23 22:02 jkh52

/cc @cheftako

Feb 23 '23 01:02 jkh52

Alternative proposal for migration plan

Summary: Each component handles incoming packets with only old identifiers (random/connectionID), only new identifier (tunnelUID), or both. Each component outputs packets that always include both sets of identifiers (random/connectionID AND tunnelUID). This way, in the intermediate compatibility phase, each component is both fully backwards compatible and forwards compatible.

Dial flow:

Konnectivity-Client (frontend): Start dial
- Generate a random & tunnelUID. Store conn keyed off tunnelUID, pendingDial included in conn. include random in pendingDial. Store random --> tunnelUID mapping. Send a DIAL_REQ with random and tunnelUID.
Proxy-Server: Receives DIAL_REQ, forwards to agent. Handle depending on IDs in the request
- only random: Generate a tunnelUID, store the connection info (pending dial) in a map indexed by tunnelUID. Insert mapping of dialID --> tunnelUID into pending dials map. Forward request with tunnelUID & dialID
- only tunnelUID: Generate a dialID (random), store the connection info in a map indexed by tunnelUID, add the dialID --> tunnelUID mapping. Forward request with tunnelUID & dialID
- both: Store the request info by tunnelUID, insert the dialID-->tunnelUID mapping, forward the request.
Agent (backend): Receives DIAL_REQ, sends a dial response with all IDs (random, connectionID, tunnelUID). Depending on IDs in the DIAL_REQ, behaves similarly to the server:
- only random: Generate a tunnelUID, store connContext keyed off tunnelUID. Generate connectionID. Store mapping of connectionID --> tunnelUID. Send response with random, connectionID, tunnelUID.
- only tunnelUID: Generate a connection ID. Store connContext keyed off tunnelUID. Store connectionID-->tunnelUID. No need to generate random. Send DIAL_RSP with connectionID, tunnelUID.
- both: Store connContext keyed off tunnelUID. Generate connectionID & store mapping to tunnelUID. Send RSP with random, connectionID, tunnelUID.
Proxy-Server: Receives DIAL_RSP, always forward with all IDs.
- only random & connectionID: Look-up tunnelUID by random (if not found, handle like a missing pending dial). Look up ProxyClientConnection by tunnelUID. Store connectionID --> tunnelUID. Insert connectionID into connection info. Send DIAL_RSP with random, connectionID, tunnelUID. Delete random --> tunnelUID mapping.
- only tunnelUID: Look up ProxyClientConnection by tunnelUID. Generate a connectionID [1], insert connectionID into conneciton info, store connectionID --> tunnelUID mapping. Send DIAL_RSP with random, connectionID, tunnelUID. Delete random --> tunnelUID mapping.
- both: Look up ProxyClientConnection by tunnelUID. Insert connectionID into conneciton info, store connectionID --> tunnelUID mapping. Send DIAL_RSP with random, connectionID, tunnelUID. Delete random --> tunnelUID mapping.
Konnectivity-Client: Receives DIAL_RSP
- only random & connectionID: Look-up tunnelUID by random (if not found, handle like a missing pending dial). Look up conn by tunnelUID. Store connectionID --> tunnelUID. Store connectionID with connection info. Delete random --> tunnelUID mapping. Enter steady-state.
- only tunnelUID: Look up conn by tunnelUID, get random from conn. Delete random --> tunnelUID mapping. Enter steady-state.
- Look up conn by tunnelUID. Store connectionID --> tunnelUID. Store connectionID with connection info. Delete random --> tunnelUID mapping. Enter steady-state.

Steady state: All 3 components now have the following:

Main connection map: connection info (including a connectionID) keyed off of tunnelUID
Established connections map: connectionID --> tunnelUID (exception: client if it got a DIAL_RSP without connectionID)

They handle incoming DATA and CLOSE packets accordingly:

only connectionID: look up tunnelUID from established connections map; look up connection info from tunnelUID. Send packets with both.
only tunnelUID: lookup connection info from clookup connection info from connection map, get connectionID from connection info. Send packets with both.onnection map, get connectionID from connection info. Send packets with both.
both: lookup connection info from connection map. Send packets with both.

Footnotes: [1] Generating the connectionID in the server may not be strictly necessary, but handles the case of an older client that doesn't understand tunnelUID, and a newer agent that doesn't understand old IDs (and a server that knows both). This scenario shouldn't happen with proper rollout sequencing.

Data structures:

Client:

Main connection map: tunnelUID --> conn; pendingDial struct merged into conn struct.
Pending dials maps: dialID --> tunnelUID
Established connections map: connectionID --> tunnelUID

Server:

Main connection map: tunnelUID --> ProxyClientConnection struct (including dialID & connectionID as fields)
Pending dials map: dialID (random) --> tunnelUID
Established connections map: connectionID --> tunnelUID

Agent:

Main connection map: tunnelUID --> connContext (with connectionID field)
Established connections map: connectionID --> tunnelUID

Rollout:

Phase 0: (current state) - no components understand tunnelUID Phase 1: Roll out the design described above. This handles version skew with Phase 0, so it doesn't matter which order components are updated in. Phase 2: Once we're confident that everything is update and working in Phase 1, make a new KNP release branch, and clean up handling of old IDs. This should let us significantly simplify the code. This can be safely rolled out in any order, since Phase 1 components are forwards compatible.

Mar 03 '23 20:03 tallclair

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 01 '23 20:06 k8s-triage-robot

apiserver-network-proxy apiserver-network-proxy copied to clipboard

Protocol: simplify identifers

Alternative proposal for migration plan

Dial flow:

Data structures:

Rollout:

apiserver-network-proxy
apiserver-network-proxy copied to clipboard