CCF icon indicating copy to clipboard operation
CCF copied to clipboard

Destroy idle node-to-node channels automatically

Open jumaffre opened this issue 4 years ago • 2 comments

Our current policy to closing node-to-node channels was originally too strict: a channel to a node is destroyed as soon as we have (globally) committed its retirement. #2654 relaxes that, at the cost of keeping channels open forever, which causes a relatively small memory growth (nodes should be recycled frequently enough that this doesn't cause any issues in practice).

There isn't a clear point in time when a node-to-channel can safely be destroyed. For example, as a retired primary, I may want to keep a channel open to other nodes to forward client requests to the new configuration (see https://github.com/microsoft/CCF/issues/1713). As a retired backup, I may also receive the response of a forwarded RPC from the primary, which should still be returned to the client.

Instead, we should periodically destroy channels that have been idle (both on send and receive) for a while (frequency TBC. but probably a multiple of the election timeout). If a node wants to send a message to a channel that has been destroyed, the message will be queued (*) and a new channel establishment will start. On message reception from a node whose channel was previously destroyed, we simply recreate a new channel.

While the channel re-establishment has already been implemented in #2092, we'll need to periodically tick() the ChannelManager and destroy channels that have been idle since for a while.

(*) To cap memory growth, we only queue one message for now.

jumaffre avatar Jun 15 '21 14:06 jumaffre

@eddyashton was this done in #2801 or is it still outstanding?

achamayou avatar Nov 08 '21 09:11 achamayou

This is still outstanding.

eddyashton avatar Nov 08 '21 09:11 eddyashton