iotex-core icon indicating copy to clipboard operation
iotex-core copied to clipboard

improve blocksync speed

Open dustinxie opened this issue 2 years ago • 3 comments

What would you like to be added

as title

Why is this needed

currently the mainnet node syncs at a rate of 3-5 blocks every second, which is not satisfactory. when running a fullnode, sometime can observer the following:

  1. this log dispatcher/dispatcher.go:372 dispatcher block channel is full, drop an event. will print out, showing the receiving channel has been filled up faster than blocks are drained (committed)
  2. sometimes on testnet, it will commit in a batch (you can see from the log) like 20~ blocks, then pause for couple of seconds, then commit the next batch, it seems the blocksync go-routine has a bottleneck/hiccup somewhere

How important you think this is for IoTeX

  • [x] must have
  • [] should have
  • [] nice to have

Additional information

Let us know any background or context that would help us better understand the request (for example the particular use-case that prompted this request)

dustinxie avatar Jan 31 '23 04:01 dustinxie

The couple of seconds pause maybe caused by these code:

// blocksync/blocksync.go:179
func (bs *blockSyncer) sync() {
	updateTime, targetHeight := bs.flushInfo()
	if updateTime.Add(bs.cfg.Interval).After(time.Now()) {
		return
	}
        ...
}

It will request blocks only after block buffer is empty regardless of the blocksync interval config. In the period from block buffer is empty to next sync, there is no block to be commit.

The actual blocksync timeline may look like:

assume: sync_interval=30s block_buffer_size=200 speed=5 block/s

00:00:00    sync
                    commit blocks
00:00:30    ignore sync for block buffer is not empty
                    commit blocks
00:00:40    block buffer empty
                    no block to be commit and no sync ( pause )
00:01:00    sync

envestcc avatar Jan 31 '23 14:01 envestcc

Block channel is full maybe caused next height block haven't been received for a long time during blocksync process.

As we know, the config in mainnet is:

dispatcher:
  blockChanSize: 1000

blockSync:
  interval: 10s
  bufferSize: 400
  maxRepeat: 3
  repeatDecayStep: 3
  intervalSize: 20

According to following code, node will request about 860 (400x2+20x3) blocks from neighbours after sync once. The blockChanSize is enough at this time.

// blocksync/blocksync.go:193
bs.requestBlock(context.Background(), interval.Start, interval.End, bs.cfg.MaxRepeat-i/bs.cfg.RepeatDecayStep)

The next sync must wait interval time after the last block commit. So there are two passible situation to start next sync:

  • Situation 1: All blocks those are received from last sync have been commited. It will not fill the block channel up when next sync start.
  • Situation 2: There are some blocks to be commit in block channel, but the next height block haven't been received for a interval time. It may make the block channel full when next sync start.

envestcc avatar Feb 01 '23 13:02 envestcc

Two principles we want to adhere to when syncing blocks are

  1. New blocks are continuously put by the statedb
  2. The channel for block in the dispatcher should not be full when syncing( Current request strategy with fixed time intervals might need to be improved.

Liuhaai avatar Feb 05 '23 08:02 Liuhaai