CCF icon indicating copy to clipboard operation
CCF copied to clipboard

CCF node should reply with a timeout error if a request takes too long to execute

Open jumaffre opened this issue 6 years ago • 4 comments

In the case of Write requests executed directly on the leader or Read on any node, RPC requests execute and return synchronously to the caller. However, in the case of forwarded Write requests (or when using PBFT), requests could take a long time to execute or, in a failure case, not execute at all (e.g. if the forwardee node dies).

As of now, the session will hang until the client decides to close it. To address this, we should make sure that a RPC session opened for too long in the enclave times out and returns a error message (e.g. https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408) to the caller.

This could make use of https://github.com/microsoft/CCF/blob/master/src/node/timer.h.

jumaffre avatar Oct 28 '19 11:10 jumaffre

Does the enabled tcp_keepalives resolve this issue? It won't return an error message to the client, but it will be obvious that the connection has dropped.

olgavrou avatar Nov 21 '19 09:11 olgavrou

@olgavrou no it doesn't, the node that does the forwarding still needs to decide the response isn't coming back and tell the client.

achamayou avatar Nov 21 '19 10:11 achamayou

From #196

Another approach would be to mark the corresponding node-to-node channel as CLOSED when the TCP connection between 2 nodes is closed on the host (it will probably require a new ring buffer message type). That way, when a command is forwarded by a follower to the (now-unreachable) leader, the forwarding will fail and return a RPC_NOT_FORWARDED JSON-RPC error.

jumaffre avatar Jun 25 '20 14:06 jumaffre

This is an important quality of life issue to be solved in 3.x, and depends on the re-thinking of forwarding.

achamayou avatar May 23 '22 09:05 achamayou