go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

core, eth, trie: write nodebuffer asynchronously to disk

Open joeylichang opened this issue 1 year ago • 11 comments

Pathdb will write trienodes to the disk in one batch after the nodebuffer is full. The operation of writing to the disk will block the execution of blocks, which will cause performance glitches from the monitoring point of view. This PR proposes asyncnodebuffer to formally solve the performance glitch introduced by this. asyncnodebuffer is an optimization of nodebuffer for asynchronous writing to disk. It consists of two nodebuffer. When the current nodebuffer is full, it will become immutable and switch to the background to write to disk. The new front nodebuffer is used for subsequent writing of chasing blocks. Of course, the immutable nodebuffer is still readable.

joeylichang avatar Nov 07 '23 02:11 joeylichang

@rjl493456442 Can I invite you to help evaluate this PR?

joeylichang avatar Nov 07 '23 03:11 joeylichang

I once had a similar idea, but I abandoned it due to certain limitations in LevelDB. Specifically, the following write operations will remain blocked even if we flush the buffer in the background. Nevertheless, it's certainly worth considering this approach within the context of Pebble.

As you can see, it involves a non-trivial complexity. Do you have any performance data before we dive into details?

rjl493456442 avatar Nov 07 '23 04:11 rjl493456442

I once had a similar idea, but I abandoned it due to certain limitations in LevelDB. Specifically, the following write operations will remain blocked even if we flush the buffer in the background. Nevertheless, it's certainly worth considering this approach within the context of Pebble.

As you can see, it involves a non-trivial complexity. Do you have any performance data before we dive into details?

Yes, writes limited by leveldb will still be blocked. Since asynchronous writing does not block the main process, subsequent block chasing performance is still much smoother due to cache hits, which is still recommended.

Intel x86, 16 core, 64G, 3000 IOPS, 128M/S EC2 ~1200 tx/block, flush 256MB nodebuffer causes glitches of ~30s during leveldb compaction.

joeylichang avatar Nov 07 '23 06:11 joeylichang

I once had a similar idea, but I abandoned it due to certain limitations in LevelDB. Specifically, the following write operations will remain blocked even if we flush the buffer in the background. Nevertheless, it's certainly worth considering this approach within the context of Pebble. As you can see, it involves a non-trivial complexity. Do you have any performance data before we dive into details?

Yes, writes limited by leveldb will still be blocked. Since asynchronous writing does not block the main process, subsequent block chasing performance is still much smoother due to cache hits, which is still recommended.

Intel x86, 16 core, 64G, 3000 IOPS, 128M/S EC2 ~1200 tx/block, flush 256MB nodebuffer causes glitches of ~30s during leveldb compaction.

@rjl493456442 How do you think this PR is necessary?

joeylichang avatar Nov 09 '23 03:11 joeylichang

@joeylichang I will try to do some benchmarks first.

rjl493456442 avatar Nov 09 '23 09:11 rjl493456442

Deployed on our benchmark machine, will post the result once it's available.

rjl493456442 avatar Nov 18 '23 09:11 rjl493456442

The first wave of benchmark data.

截屏2023-11-19 下午2 21 09

After running for a few hours, it turns out this pull request is slightly faster than master(Note, I rebase your pull request against laster master branch before deployement).

07 is pull request 08 is master

截屏2023-11-19 下午2 22 20

The worst execution time drops from 3s to 1s


From the logs I can tell, during the flushing, the overall execution performance is still affected, even by putting the write in the background. We can see the mgasps is dropped on both machine when flushing.

截屏2023-11-19 下午2 25 08 截屏2023-11-19 下午2 24 23

Let's keep them running for a few more days then.

rjl493456442 avatar Nov 19 '23 06:11 rjl493456442

Let's keep them running for a few more days then.

cool~, looking forward to the result.

Execution performance is still affected ——— According to our observation, it is mainly the impact of disk bandwidth, which blocks the execution of transactions. We have conducted a comparative test of pebbledb vs leveldb, and pebbledb will perform better.

joeylichang avatar Nov 20 '23 09:11 joeylichang

From the logs I can tell, during the flushing, the overall execution performance is still affected, even by putting the write in the background. We can see the mgasps is dropped on both machine when flushing.

@rjl493456442 Can you also show the disk read/write metrics here? Maybe disk bandwidth is the bottleneck.

fynnss avatar Nov 20 '23 09:11 fynnss

After running a few more days, this pull request is consistently faster than master.

image 截屏2023-11-20 下午9 28 32

Due to fact that accumulating 256 Megabytes state in memory will take roughly 1.5min in full sync, so the theoretical performance speedup is ~1%(average_buffer_flush_time / 1.5min). Although the speedup is not that obvious from the metrics.

rjl493456442 avatar Nov 20 '23 13:11 rjl493456442

@rjl493456442 Any further results?

joeylichang avatar Nov 30 '23 02:11 joeylichang