forest
forest copied to clipboard
Investigate the effect of different RocksDB options.
Issue summary
Importing a mainnet snapshot into RocksDB is slower than expected: The import speed starts high but then drops as low as 20MB/s. This is a known issue for RocksDB and there are several options specifically dealing with bulk data import. We should identify the options that are relevant for improving the performance of bulk imports and then run benchmarks.
Task summary
- [ ] Identity which RocksDB options affect bulk importing. Things like temporarily disabling compaction, etc.
- [ ] Add those options to the configuration file.
- [ ] Do rough benchmarks (remove the db, import a snapshot, quit). We're looking for 2x to 10x improvements, not 5-10%.
- [ ] Report findings in this issue.
Acceptance Criteria
- [ ] A recommendation of which RocksDB configuration to use by default.
Other information and links
https://rockset.com/blog/optimizing-bulk-load-in-rocksdb/ https://docs.rs/rocksdb/latest/rocksdb/struct.Options.html https://github.com/EighteenZi/rocksdb_wiki/blob/master/RocksDB-Tuning-Guide.md
Change | Import Time (min) |
---|---|
basline (no change) | ~60 |
max_open_files set to 5000 or -1 |
21 |
set compression_type to none |
~60 |
max_open_files set to 5000 and enable_index_compression set to false |
17 |
Remember that compression is different from compaction and that you'll have to edit the code to disable the compaction.
Also, importing the calibnet snapshot has a much lower turnaround time. That makes it easier to do lots of tests.
I did some experiments using both a calibnet snapshot and a mainnet one:
calibnet:
Change | Import Time |
---|---|
baseline (no change) | ~33s |
set compaction_style to None |
~33s |
mainnet:
Change | Import Time |
---|---|
baseline (no change) | ~1507s |
set compaction_style to None |
~529s |
set disable_auto_compactions to true and max_open_files to 4096 |
~543s |
set disable_auto_compactions to true and write_buffer_size to 64MB |
~507s |
set disable_auto_compactions to true and write_buffer_size to 32MB and batch_size to 100 |
~467s* |
set prepare_for_bulk_load to true and set write_buffer_size to 64MB |
~509s** |
set optimize_filters_for_hits to true |
~1368s |
set set_unordered_write to true |
failed*** |
set optimize_for_point_lookup to 4096 |
~1385s |
set prepare_for_bulk_load to true and set optimize_filters_for_hits to true and set optimize_for_point_lookup to 4096 |
~525s |
The impact is more visible on the mainnet snapshot (don't know exactly why). With compaction the performance rapidly decreases from 180MB/s to 100MB/s in a few seconds, and at the end we reach 58MB/s. Without compaction we start at 180MB/s and performance maintains well up to 173MB/s, and at the end only drops to 167MB/s.
Having a smaller write buffer during this phase looks like a win.
Note that I'm measuring in a first place this change using compaction set to None
when creating RocksDbConfig
struct.
What we really want to do is to disable compaction by code, doing the bulk writes, enable back compaction/ perform the final compaction. This should be the next steps.
*this later results in Too many open files
error, bumping max_open_files
doesn't help. batch_size
is hardcoded to 1000 in fvm_ipld_car
crate.
**while this seems like a win during import, the performance of the next step (Scanning Blockchain) is very much lower.
***error was CAR error: IO error: No space left on device: While appending to file: 002175.log: No space left on device
Test env:
- OS: macOS 11.5.2, 6-Core, 16GB of ram
- Snapshots:
- forest_snapshot_2022-August-23_height_1238709.car
- minimal_finality_stateroots_2215681_2022-10-03_06-00-30.car
calibnet:
cmdline: ./target/release/forest --chain=calibnet --target-peer-count 50 --encrypt-keystore false --import-snapshot forest_snapshot_calibnet_2022-10-05_height_1362990.car --halt-after-import -c calibnet.toml
Change | Import Time | Total Time |
---|---|---|
baseline (no change) | ~35s | ~480s |
set disable_auto_compactions to true , compacting manually |
~33s | ~263s |
Manualy compacting all L0 files to minimum level seems to work for Calibnet. Just a few weird errors after (but syncing seems to work):
2022-10-06T13:44:49.254Z INFO forest_genesis > Accepting [Cid(bafy2bzacedloowbhwjsjpqrbw7wy3tkp7hc2vjld2zrmg3qgqubtmeq4cuppw), Cid(bafy2bzaced3qkukyapz3rdg2uxrpf3lnb6w7epxknks7llkbdzoke3ht7cp2k), Cid(bafy2bzacedn2cbclp2sbnmcsl5rpdd35zq3ayojdvznyqt72naj67v6fwyzem), Cid(bafy2bzaced5pu4sak2lswn75qpkb65crnibf7jjbmdrxdxnpfblujrgoyvasu), Cid(bafy2bzaceb4szhxdcrvnyw43w3z2xglfgyyyijqdak64qlh4unsknfdyzr532)] as new head.
2022-10-06T13:44:49.254Z INFO forest::daemon > Imported snapshot in: 263s
2022-10-06T13:44:49.254Z INFO forest::daemon > Forest finish shutdown
2022-10-06T13:44:49.255Z ERROR forest_chain_sync::chain_muxer > Evaluating the network head failed, retrying. Error = P2PEventStreamReceive("receiving from an empty and closed channel")
Starting new tests on mainnet:
cmdline: ./target/release/forest --target-peer-count 50 --encrypt-keystore false --import-snapshot minimal_finality_stateroots_2215681_2022-10-03_06-00-30.car --halt-after-import -c config.toml
Change | Import Time | Total Time |
---|---|---|
baseline (no change) | ~1367s | ~3120s |
set optimize_filters_for_hits to true |
~1264s | ~3077s |
set compaction_style to None and optimize_for_point_lookup to 256 |
~512s | ~1644s* |
set compaction_style to None and optimize_for_point_lookup to 256 and write_buffer_size to 64 |
~478s | ~2616s |
set optimize_filters_for_hits to true and optimize_for_point_lookup to 8 |
~1307s | ~2776s |
*process reached 1.77GB, compared to 1.33GB for baseline, so it's definitely a tradeoff
Performance degrades quite a lot with the number of .sst
when compaction is disabled. Forcing 64MB memtable will result in 4x more files in L0 and emulate a bigger DB over time.
There are some worst case scenarii to consider (ie malicious payload that will ask for very old keys forcing to iterate over all .sst
bloom filters).
More results:
baseline
is the defaults in PR #1998.
Change | Import Time | Total Time |
---|---|---|
droplet (baseline) | 2739s | 6813s |
lemmih (baseline) | 5124s | 6678s |
hubert (baseline) | 928s | 1451s |
droplet (no compaction) | 1601s | 3326s |
lemmih (no compaction) | 2500s | 2764s |
hubert (no compaction) | 393s | 574s |
Settings for no compaction
:
[rocks_db]
max_open_files = -1
compression_type = "none"
compaction_style = "none"
optimize_for_point_lookup = 256