forest icon indicating copy to clipboard operation
forest copied to clipboard

Investigate the effect of different RocksDB options.

Open lemmih opened this issue 2 years ago • 3 comments

Issue summary

Importing a mainnet snapshot into RocksDB is slower than expected: The import speed starts high but then drops as low as 20MB/s. This is a known issue for RocksDB and there are several options specifically dealing with bulk data import. We should identify the options that are relevant for improving the performance of bulk imports and then run benchmarks.

Task summary

  • [ ] Identity which RocksDB options affect bulk importing. Things like temporarily disabling compaction, etc.
  • [ ] Add those options to the configuration file.
  • [ ] Do rough benchmarks (remove the db, import a snapshot, quit). We're looking for 2x to 10x improvements, not 5-10%.
  • [ ] Report findings in this issue.

Acceptance Criteria

  • [ ] A recommendation of which RocksDB configuration to use by default.

Other information and links

https://rockset.com/blog/optimizing-bulk-load-in-rocksdb/ https://docs.rs/rocksdb/latest/rocksdb/struct.Options.html https://github.com/EighteenZi/rocksdb_wiki/blob/master/RocksDB-Tuning-Guide.md

lemmih avatar Aug 18 '22 12:08 lemmih

Change Import Time (min)
basline (no change) ~60
max_open_files set to 5000 or -1 21
set compression_type to none ~60
max_open_files set to 5000 and enable_index_compression set to false 17

jdjaustin avatar Sep 14 '22 21:09 jdjaustin

Remember that compression is different from compaction and that you'll have to edit the code to disable the compaction.

lemmih avatar Sep 15 '22 15:09 lemmih

Also, importing the calibnet snapshot has a much lower turnaround time. That makes it easier to do lots of tests.

lemmih avatar Sep 15 '22 15:09 lemmih

I did some experiments using both a calibnet snapshot and a mainnet one:

calibnet:

Change Import Time
baseline (no change) ~33s
set compaction_style to None ~33s

mainnet:

Change Import Time
baseline (no change) ~1507s
set compaction_style to None ~529s
set disable_auto_compactions to true and max_open_files to 4096 ~543s
set disable_auto_compactions to true and write_buffer_size to 64MB ~507s
set disable_auto_compactions to true and write_buffer_size to 32MB and batch_size to 100 ~467s*
set prepare_for_bulk_load to true and set write_buffer_size to 64MB ~509s**
set optimize_filters_for_hits to true ~1368s
set set_unordered_write to true failed***
set optimize_for_point_lookup to 4096 ~1385s
set prepare_for_bulk_load to true and set optimize_filters_for_hits to true and set optimize_for_point_lookup to 4096 ~525s

The impact is more visible on the mainnet snapshot (don't know exactly why). With compaction the performance rapidly decreases from 180MB/s to 100MB/s in a few seconds, and at the end we reach 58MB/s. Without compaction we start at 180MB/s and performance maintains well up to 173MB/s, and at the end only drops to 167MB/s.

Having a smaller write buffer during this phase looks like a win.

Note that I'm measuring in a first place this change using compaction set to None when creating RocksDbConfig struct.

What we really want to do is to disable compaction by code, doing the bulk writes, enable back compaction/ perform the final compaction. This should be the next steps.

*this later results in Too many open files error, bumping max_open_files doesn't help. batch_size is hardcoded to 1000 in fvm_ipld_car crate.

**while this seems like a win during import, the performance of the next step (Scanning Blockchain) is very much lower.

***error was CAR error: IO error: No space left on device: While appending to file: 002175.log: No space left on device

Test env:

  • OS: macOS 11.5.2, 6-Core, 16GB of ram
  • Snapshots:
    • forest_snapshot_2022-August-23_height_1238709.car
    • minimal_finality_stateroots_2215681_2022-10-03_06-00-30.car

elmattic avatar Oct 03 '22 14:10 elmattic

calibnet:

cmdline: ./target/release/forest --chain=calibnet --target-peer-count 50 --encrypt-keystore false --import-snapshot forest_snapshot_calibnet_2022-10-05_height_1362990.car --halt-after-import -c calibnet.toml

Change Import Time Total Time
baseline (no change) ~35s ~480s
set disable_auto_compactions to true, compacting manually ~33s ~263s

Manualy compacting all L0 files to minimum level seems to work for Calibnet. Just a few weird errors after (but syncing seems to work):

2022-10-06T13:44:49.254Z INFO  forest_genesis                   > Accepting [Cid(bafy2bzacedloowbhwjsjpqrbw7wy3tkp7hc2vjld2zrmg3qgqubtmeq4cuppw), Cid(bafy2bzaced3qkukyapz3rdg2uxrpf3lnb6w7epxknks7llkbdzoke3ht7cp2k), Cid(bafy2bzacedn2cbclp2sbnmcsl5rpdd35zq3ayojdvznyqt72naj67v6fwyzem), Cid(bafy2bzaced5pu4sak2lswn75qpkb65crnibf7jjbmdrxdxnpfblujrgoyvasu), Cid(bafy2bzaceb4szhxdcrvnyw43w3z2xglfgyyyijqdak64qlh4unsknfdyzr532)] as new head.
2022-10-06T13:44:49.254Z INFO  forest::daemon                   > Imported snapshot in: 263s
2022-10-06T13:44:49.254Z INFO  forest::daemon                   > Forest finish shutdown
2022-10-06T13:44:49.255Z ERROR forest_chain_sync::chain_muxer   > Evaluating the network head failed, retrying. Error = P2PEventStreamReceive("receiving from an empty and closed channel")

elmattic avatar Oct 06 '22 14:10 elmattic

Starting new tests on mainnet:

cmdline: ./target/release/forest --target-peer-count 50 --encrypt-keystore false --import-snapshot minimal_finality_stateroots_2215681_2022-10-03_06-00-30.car --halt-after-import -c config.toml

Change Import Time Total Time
baseline (no change) ~1367s ~3120s
set optimize_filters_for_hits to true ~1264s ~3077s
set compaction_style to None and optimize_for_point_lookup to 256 ~512s ~1644s*
set compaction_style to None and optimize_for_point_lookup to 256 and write_buffer_size to 64 ~478s ~2616s
set optimize_filters_for_hits to true and optimize_for_point_lookup to 8 ~1307s ~2776s

*process reached 1.77GB, compared to 1.33GB for baseline, so it's definitely a tradeoff

elmattic avatar Oct 14 '22 10:10 elmattic

Performance degrades quite a lot with the number of .sst when compaction is disabled. Forcing 64MB memtable will result in 4x more files in L0 and emulate a bigger DB over time.

There are some worst case scenarii to consider (ie malicious payload that will ask for very old keys forcing to iterate over all .sst bloom filters).

elmattic avatar Oct 17 '22 09:10 elmattic

More results:

baseline is the defaults in PR #1998.

Change Import Time Total Time
droplet (baseline) 2739s 6813s
lemmih (baseline) 5124s 6678s
hubert (baseline) 928s 1451s
droplet (no compaction) 1601s 3326s
lemmih (no compaction) 2500s 2764s
hubert (no compaction) 393s 574s

Settings for no compaction:

[rocks_db]
max_open_files            = -1
compression_type          = "none"
compaction_style          = "none"
optimize_for_point_lookup = 256

lemmih avatar Oct 18 '22 09:10 lemmih