CCF
CCF copied to clipboard
CCF node should reply with a timeout error if a request takes too long to execute
In the case of Write requests executed directly on the leader or Read on any node, RPC requests execute and return synchronously to the caller. However, in the case of forwarded Write requests (or when using PBFT), requests could take a long time to execute or, in a failure case, not execute at all (e.g. if the forwardee node dies).
As of now, the session will hang until the client decides to close it. To address this, we should make sure that a RPC session opened for too long in the enclave times out and returns a error message (e.g. https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408) to the caller.
This could make use of https://github.com/microsoft/CCF/blob/master/src/node/timer.h.
Does the enabled tcp_keepalives resolve this issue? It won't return an error message to the client, but it will be obvious that the connection has dropped.
@olgavrou no it doesn't, the node that does the forwarding still needs to decide the response isn't coming back and tell the client.
From #196
Another approach would be to mark the corresponding node-to-node channel as
CLOSEDwhen the TCP connection between 2 nodes is closed on the host (it will probably require a new ring buffer message type). That way, when a command is forwarded by a follower to the (now-unreachable) leader, the forwarding will fail and return aRPC_NOT_FORWARDEDJSON-RPC error.
This is an important quality of life issue to be solved in 3.x, and depends on the re-thinking of forwarding.