go-ethereum
go-ethereum copied to clipboard
core, eth, trie: write nodebuffer asynchronously to disk
Pathdb will write trienodes to the disk in one batch after the nodebuffer
is full. The operation of writing to the disk will block the execution of blocks, which will cause performance glitches from the monitoring point of view. This PR proposes asyncnodebuffer
to formally solve the performance glitch introduced by this. asyncnodebuffer
is an optimization of nodebuffer
for asynchronous writing to disk. It consists of two nodebuffer
. When the current nodebuffer is full, it will become immutable and switch to the background to write to disk. The new front nodebuffer
is used for subsequent writing of chasing blocks. Of course, the immutable nodebuffer
is still readable.
@rjl493456442 Can I invite you to help evaluate this PR?
I once had a similar idea, but I abandoned it due to certain limitations in LevelDB. Specifically, the following write operations will remain blocked even if we flush the buffer in the background. Nevertheless, it's certainly worth considering this approach within the context of Pebble.
As you can see, it involves a non-trivial complexity. Do you have any performance data before we dive into details?
I once had a similar idea, but I abandoned it due to certain limitations in LevelDB. Specifically, the following write operations will remain blocked even if we flush the buffer in the background. Nevertheless, it's certainly worth considering this approach within the context of Pebble.
As you can see, it involves a non-trivial complexity. Do you have any performance data before we dive into details?
Yes, writes limited by leveldb will still be blocked. Since asynchronous writing does not block the main process, subsequent block chasing performance is still much smoother due to cache hits, which is still recommended.
Intel x86, 16 core, 64G, 3000 IOPS, 128M/S EC2 ~1200 tx/block, flush 256MB nodebuffer causes glitches of ~30s during leveldb compaction.
I once had a similar idea, but I abandoned it due to certain limitations in LevelDB. Specifically, the following write operations will remain blocked even if we flush the buffer in the background. Nevertheless, it's certainly worth considering this approach within the context of Pebble. As you can see, it involves a non-trivial complexity. Do you have any performance data before we dive into details?
Yes, writes limited by leveldb will still be blocked. Since asynchronous writing does not block the main process, subsequent block chasing performance is still much smoother due to cache hits, which is still recommended.
Intel x86, 16 core, 64G, 3000 IOPS, 128M/S EC2 ~1200 tx/block, flush 256MB nodebuffer causes glitches of ~30s during leveldb compaction.
@rjl493456442 How do you think this PR is necessary?
@joeylichang I will try to do some benchmarks first.
Deployed on our benchmark machine, will post the result once it's available.
The first wave of benchmark data.
After running for a few hours, it turns out this pull request is slightly faster than master(Note, I rebase your pull request against laster master branch before deployement).
07 is pull request 08 is master
The worst execution time drops from 3s to 1s
From the logs I can tell, during the flushing, the overall execution performance is still affected, even by putting the write in the background. We can see the mgasps
is dropped on both machine when flushing.
Let's keep them running for a few more days then.
Let's keep them running for a few more days then.
cool~, looking forward to the result.
Execution performance is still affected
——— According to our observation, it is mainly the impact of disk bandwidth, which blocks the execution of transactions.
We have conducted a comparative test of pebbledb vs leveldb, and pebbledb will perform better.
From the logs I can tell, during the flushing, the overall execution performance is still affected, even by putting the write in the background. We can see the mgasps is dropped on both machine when flushing.
@rjl493456442 Can you also show the disk read/write metrics here? Maybe disk bandwidth is the bottleneck.
After running a few more days, this pull request is consistently faster than master.
Due to fact that accumulating 256 Megabytes state in memory will take roughly 1.5min in full sync, so the theoretical performance speedup is ~1%(average_buffer_flush_time / 1.5min). Although the speedup is not that obvious from the metrics.
@rjl493456442 Any further results?