chaingraph icon indicating copy to clipboard operation
chaingraph copied to clipboard

Schedule downloads and buffer blocks more intelligently (during initial sync)

Open bitjson opened this issue 3 years ago • 0 comments

Chaingraph currently maintains a single, undifferentiated "block buffer" of blocks which are waiting to be saved to the database. When a block is saved successfully, it is removed from the buffer and the space is made available for new blocks. When the buffer is using less memory than its target size, the agent attempts to download a new block from the least-synced chain (measured by block height):

https://github.com/bitauth/chaingraph/blob/cbebedefea908957b0373d77a60ec17fdff2050b/src/agent.ts#L882-L910

Before requesting a new block for download, the agent "reserves" space in the block buffer equal to the average size of all other blocks currently in the block buffer. There are also a couple of limits:

https://github.com/bitauth/chaingraph/blob/cbebedefea908957b0373d77a60ec17fdff2050b/src/agent.ts#L65-L85

This download strategy has several issues:

  • Blocks can be wildly different sizes between nodes, especially when syncing both mainnet and testnet. If most blocks in the buffer are from testnet, the agent might reserve <1MB for several 32MB blocks.
    • TODO: the block buffer should differentiate between nodes, reserving space based on the average for the node from which the next block is being requested.
  • Blocks are never dumped from the block buffer, so a series of large blocks risks exceeding the agent's memory limits. (With a target of 200MB, I've seen block buffer usage exceed 350MB when many large blocks are requested using poorly-estimated reservation sizes.)
    • TODO: if the block buffer exceeds the target size by too much (10%?) we should drop new blocks until the buffer shrinks below the target size.
  • Chaingraph always focuses on downloading the next block from the least-synced node, but if that node is itself syncing (a.k.a. Initial Block Download, IBD), Chaingraph could be wasting time waiting for those blocks. We can assume that all the nodes will eventually give us blocks quickly, so any slow nodes should be skipped until they speed up.
    • TODO: download blocks from whichever node is expected to return a block fastest. (We already measure download speed and divide it by pending downloads to estimate the next speed and choose the fastest expected node; we just need to use this information to select which blocks to download.)
  • Some nodes may be pruned, and therefore can't send us blocks older than ~48 hours.
    • TODO: It should be possible to specify via CHAINGRAPH_TRUSTED_NODES that a node is pruned, e.g. bchn:127.0.0.1:8333:mainnet-pruned (actually, could this just be automatically detected?). The agent should only attempt to download recent blocks from pruned nodes.
  • When we receive an unexpected block, the agent just pretends it is a new header for simplicity. (And even logs that the node unexpectedly sent a set of already-known headers)
    • TODO: save the block, don't log that it's an unexpected headers.
  • Download timeouts are an afterthought, but Satoshi-based C++ implementations frequently hit them during IBD.
    • TODO: take current block downloads into account when calculating expected download time for nodes, rather than only completed downloads – if existing downloads have been waiting for longer than a few seconds, other downloads probably shouldn't be scheduled against that node unless all other nodes are fully synced. If available, the slow-downloading blocks should also be requested from another connected node (so the downloads aren't waiting until the 5 minute timeout to before being requested from a responsive node).

Finally, we should adjust assumedMaxBlockSize (used when the agent has yet to receive any blocks from a particular node) based on the node's network (e.g. 2MB for testnet4, 32MB for mainnet, 256MB for scalenet, etc.):

https://github.com/bitauth/chaingraph/blob/cbebedefea908957b0373d77a60ec17fdff2050b/src/components/block-buffer.ts#L87-L107

These values should probably be provided as a CHAINGRAPH_MAX_BLOCK_SIZES environment variable much like CHAINGRAPH_GENESIS_BLOCKS.

This issue should be addresses alongside https://github.com/bitauth/chaingraph/issues/5.

bitjson avatar Nov 19 '21 23:11 bitjson