go-ethereum core, eth, trie: write nodebuffer asynchronously to disk

Pathdb will write trienodes to the disk in one batch after the nodebuffer is full. The operation of writing to the disk will block the execution of blocks, which will cause performance glitches from the monitoring point of view. This PR proposes asyncnodebuffer to formally solve the performance glitch introduced by this. asyncnodebuffer is an optimization of nodebuffer for asynchronous writing to disk. It consists of two nodebuffer. When the current nodebuffer is full, it will become immutable and switch to the background to write to disk. The new front nodebuffer is used for subsequent writing of chasing blocks. Of course, the immutable nodebuffer is still readable.

Nov 07 '23 02:11 joeylichang

@rjl493456442 Can I invite you to help evaluate this PR?

Nov 07 '23 03:11 joeylichang

I once had a similar idea, but I abandoned it due to certain limitations in LevelDB. Specifically, the following write operations will remain blocked even if we flush the buffer in the background. Nevertheless, it's certainly worth considering this approach within the context of Pebble.

As you can see, it involves a non-trivial complexity. Do you have any performance data before we dive into details?

Nov 07 '23 04:11 rjl493456442

I once had a similar idea, but I abandoned it due to certain limitations in LevelDB. Specifically, the following write operations will remain blocked even if we flush the buffer in the background. Nevertheless, it's certainly worth considering this approach within the context of Pebble.

As you can see, it involves a non-trivial complexity. Do you have any performance data before we dive into details?

Yes, writes limited by leveldb will still be blocked. Since asynchronous writing does not block the main process, subsequent block chasing performance is still much smoother due to cache hits, which is still recommended.

Intel x86, 16 core, 64G, 3000 IOPS, 128M/S EC2 ~1200 tx/block, flush 256MB nodebuffer causes glitches of ~30s during leveldb compaction.

Nov 07 '23 06:11 joeylichang

I once had a similar idea, but I abandoned it due to certain limitations in LevelDB. Specifically, the following write operations will remain blocked even if we flush the buffer in the background. Nevertheless, it's certainly worth considering this approach within the context of Pebble. As you can see, it involves a non-trivial complexity. Do you have any performance data before we dive into details?

Yes, writes limited by leveldb will still be blocked. Since asynchronous writing does not block the main process, subsequent block chasing performance is still much smoother due to cache hits, which is still recommended.

Intel x86, 16 core, 64G, 3000 IOPS, 128M/S EC2 ~1200 tx/block, flush 256MB nodebuffer causes glitches of ~30s during leveldb compaction.

@rjl493456442 How do you think this PR is necessary?

Nov 09 '23 03:11 joeylichang

@joeylichang I will try to do some benchmarks first.

Nov 09 '23 09:11 rjl493456442

Deployed on our benchmark machine, will post the result once it's available.

Nov 18 '23 09:11 rjl493456442

The first wave of benchmark data.

After running for a few hours, it turns out this pull request is slightly faster than master(Note, I rebase your pull request against laster master branch before deployement).

07 is pull request 08 is master

The worst execution time drops from 3s to 1s

From the logs I can tell, during the flushing, the overall execution performance is still affected, even by putting the write in the background. We can see the mgasps is dropped on both machine when flushing.

Let's keep them running for a few more days then.

Nov 19 '23 06:11 rjl493456442

Let's keep them running for a few more days then.

cool~, looking forward to the result.

Execution performance is still affected ——— According to our observation, it is mainly the impact of disk bandwidth, which blocks the execution of transactions. We have conducted a comparative test of pebbledb vs leveldb, and pebbledb will perform better.

Nov 20 '23 09:11 joeylichang

From the logs I can tell, during the flushing, the overall execution performance is still affected, even by putting the write in the background. We can see the mgasps is dropped on both machine when flushing.

@rjl493456442 Can you also show the disk read/write metrics here? Maybe disk bandwidth is the bottleneck.

Nov 20 '23 09:11 fynnss

After running a few more days, this pull request is consistently faster than master.

截屏2023-11-20 下午9 28 32

Due to fact that accumulating 256 Megabytes state in memory will take roughly 1.5min in full sync, so the theoretical performance speedup is ~1%(average_buffer_flush_time / 1.5min). Although the speedup is not that obvious from the metrics.

Nov 20 '23 13:11 rjl493456442

@rjl493456442 Any further results?

Nov 30 '23 02:11 joeylichang

go-ethereum go-ethereum copied to clipboard

core, eth, trie: write nodebuffer asynchronously to disk

go-ethereum
go-ethereum copied to clipboard