rusk icon indicating copy to clipboard operation
rusk copied to clipboard

Handle unknown block requests at both the requestor and the supplier sides

Open autholykos opened this issue 11 months ago • 3 comments

Summary

💡 The current mechanism for block requests in the Dusk network, where nodes perform a request_block to their peers and then wait idly for either a handle_inv reply or a 30-second timeout, introduces unnecessary delays in the synchronization processes.

This RFC an enhancement on how nodes respond to request_block queries and how requesting nodes handle these responses to minimize the latency in block retrieval, and improve the resilience of the network against nodes with outdated or incomplete blockchain data.

Possible solution design or implementation

💡 The proposed solution involves two key changes to the existing protocol:

  1. Modification to Peer Response Behavior: Currently, a peer ignores a request_block if it does not have the requested block. We propose that instead of ignoring the request, the peer should reply with a handle_inv message indicating unknown_block. This explicit indication of absence will allow the requesting node to immediately seek other peers for the block, rather than waiting for a timeout period.

  2. Modification to Requesting Node Behavior: Upon receiving an unknown_block response, the requesting node should immediately proceed to request the block from other peers in its list, rather than waiting for a timeout. This proactive approach ensures that the node can quickly find and synchronize the block, improving the overall efficiency of the network.

Additional context

To further enhance the protocol, the requesting node could implement a heuristic to blacklist or deprioritize peers that consistently respond with unknown_block, under the assumption that such peers are out of sync with the network.

Possible drawbacks of this approach include the potential for increased network traffic due to the additional handle_inv messages and the complexity of managing a dynamic blacklist of peers. However, these are mitigated by the benefits of faster block synchronization and the adaptive nature of the peer selection algorithm.

autholykos avatar Mar 11 '24 16:03 autholykos

Modification to Peer Response Behavior: Currently, a peer ignores a request_block if it does not have the requested block. We propose that instead of ignoring the request, the peer should reply with a handle_inv message indicating unknown_block. This explicit indication of absence will allow the requesting node to immediately seek other peers for the block, rather than waiting for a timeout period.

From a technical standpoint, this is doable since it mirrors the conventional request-response message exchange. Yet, when faced with an unknown_block response, the node would need to blindly select another active peer, potentially encountering the same absence of the requested block. Within a network exceeding 10,000 nodes, such a method might prove inefficient.

Modification to Requesting Node Behavior: Upon receiving an unknown_block response, the requesting node should immediately proceed to request the block from other peers in its list, rather than waiting for a timeout. This proactive approach ensures that the node can quickly find and synchronize the block, improving the overall efficiency of the network.

Depending on the network latency, a round-trip time of the request will be between 100ms-400ms. Also, the receiver may be busy, or offline or simply not willing to provide the resource.

Another approach to address the problem could be to replicate Flooding with Random Walk over Kadcast together with responding with msg.Inv (Inventory message).

function broadcastGetData(BlockHash, TTL):
    for each peer in kadcast_bucket:
        send GetData message to peer with payload (BlockHash, TTL)

function handleGetDataRequest(request):
    if resourceExists(request.BlockHash):
        send Inv message to requester with resource information
    else:
        if TTL > 0:
            decrement TTL
            forwardGetDataRequestToNextBucket(request)

goshawk-3 avatar Apr 30 '24 11:04 goshawk-3

Additional context: In case of a consensus split, the number of nodes that do not know the requested block (that belongs to the main branch) is usually high.

goshawk-3 avatar Apr 30 '24 11:04 goshawk-3

The flooding with random walk over Kadcast has been deemed the best approach

autholykos avatar May 02 '24 17:05 autholykos