forest Investigate the effect of different RocksDB options.

Issue summary

Importing a mainnet snapshot into RocksDB is slower than expected: The import speed starts high but then drops as low as 20MB/s. This is a known issue for RocksDB and there are several options specifically dealing with bulk data import. We should identify the options that are relevant for improving the performance of bulk imports and then run benchmarks.

Task summary

[ ] Identity which RocksDB options affect bulk importing. Things like temporarily disabling compaction, etc.
[ ] Add those options to the configuration file.
[ ] Do rough benchmarks (remove the db, import a snapshot, quit). We're looking for 2x to 10x improvements, not 5-10%.
[ ] Report findings in this issue.

Acceptance Criteria

[ ] A recommendation of which RocksDB configuration to use by default.

Other information and links

https://rockset.com/blog/optimizing-bulk-load-in-rocksdb/ https://docs.rs/rocksdb/latest/rocksdb/struct.Options.html https://github.com/EighteenZi/rocksdb_wiki/blob/master/RocksDB-Tuning-Guide.md

Aug 18 '22 12:08 lemmih

Change Import Time (min)

basline (no change) ~60

max_open_files set to 5000 or -1 21

set compression_type to none ~60

max_open_files set to 5000 and enable_index_compression set to false 17

Change	Import Time (min)
basline (no change)	~60
`max_open_files` set to `5000` or `-1`	21
set `compression_type` to `none`	~60
`max_open_files` set to `5000` and `enable_index_compression` set to `false`	17

Sep 14 '22 21:09 jdjaustin

Remember that compression is different from compaction and that you'll have to edit the code to disable the compaction.

Sep 15 '22 15:09 lemmih

Also, importing the calibnet snapshot has a much lower turnaround time. That makes it easier to do lots of tests.

Sep 15 '22 15:09 lemmih

I did some experiments using both a calibnet snapshot and a mainnet one:

calibnet:

Change	Import Time
baseline (no change)	~33s
set `compaction_style` to `None`	~33s

mainnet:

Change	Import Time
baseline (no change)	~1507s
set `compaction_style` to `None`	~529s
set `disable_auto_compactions` to `true` and `max_open_files` to `4096`	~543s
set `disable_auto_compactions` to `true` and `write_buffer_size` to `64MB`	~507s
set `disable_auto_compactions` to `true` and `write_buffer_size` to `32MB` and `batch_size` to `100`	~467s*
set `prepare_for_bulk_load` to `true` and set `write_buffer_size` to `64MB`	~509s**
set `optimize_filters_for_hits` to `true`	~1368s
set `set_unordered_write` to `true`	failed***
set `optimize_for_point_lookup` to `4096`	~1385s
set `prepare_for_bulk_load` to `true` and set `optimize_filters_for_hits` to `true` and set `optimize_for_point_lookup` to `4096`	~525s

The impact is more visible on the mainnet snapshot (don't know exactly why). With compaction the performance rapidly decreases from 180MB/s to 100MB/s in a few seconds, and at the end we reach 58MB/s. Without compaction we start at 180MB/s and performance maintains well up to 173MB/s, and at the end only drops to 167MB/s.

Having a smaller write buffer during this phase looks like a win.

Note that I'm measuring in a first place this change using compaction set to None when creating RocksDbConfig struct.

What we really want to do is to disable compaction by code, doing the bulk writes, enable back compaction/ perform the final compaction. This should be the next steps.

*this later results in Too many open files error, bumping max_open_files doesn't help. batch_size is hardcoded to 1000 in fvm_ipld_car crate.

**while this seems like a win during import, the performance of the next step (Scanning Blockchain) is very much lower.

***error was CAR error: IO error: No space left on device: While appending to file: 002175.log: No space left on device

Test env:

OS: macOS 11.5.2, 6-Core, 16GB of ram
Snapshots:
- forest_snapshot_2022-August-23_height_1238709.car
- minimal_finality_stateroots_2215681_2022-10-03_06-00-30.car

Oct 03 '22 14:10 elmattic

calibnet:

cmdline: ./target/release/forest --chain=calibnet --target-peer-count 50 --encrypt-keystore false --import-snapshot forest_snapshot_calibnet_2022-10-05_height_1362990.car --halt-after-import -c calibnet.toml

Change	Import Time	Total Time
baseline (no change)	~35s	~480s
set `disable_auto_compactions` to `true`, compacting manually	~33s	~263s

Manualy compacting all L0 files to minimum level seems to work for Calibnet. Just a few weird errors after (but syncing seems to work):

2022-10-06T13:44:49.254Z INFO  forest_genesis                   > Accepting [Cid(bafy2bzacedloowbhwjsjpqrbw7wy3tkp7hc2vjld2zrmg3qgqubtmeq4cuppw), Cid(bafy2bzaced3qkukyapz3rdg2uxrpf3lnb6w7epxknks7llkbdzoke3ht7cp2k), Cid(bafy2bzacedn2cbclp2sbnmcsl5rpdd35zq3ayojdvznyqt72naj67v6fwyzem), Cid(bafy2bzaced5pu4sak2lswn75qpkb65crnibf7jjbmdrxdxnpfblujrgoyvasu), Cid(bafy2bzaceb4szhxdcrvnyw43w3z2xglfgyyyijqdak64qlh4unsknfdyzr532)] as new head.
2022-10-06T13:44:49.254Z INFO  forest::daemon                   > Imported snapshot in: 263s
2022-10-06T13:44:49.254Z INFO  forest::daemon                   > Forest finish shutdown
2022-10-06T13:44:49.255Z ERROR forest_chain_sync::chain_muxer   > Evaluating the network head failed, retrying. Error = P2PEventStreamReceive("receiving from an empty and closed channel")

Oct 06 '22 14:10 elmattic

Starting new tests on mainnet:

cmdline: ./target/release/forest --target-peer-count 50 --encrypt-keystore false --import-snapshot minimal_finality_stateroots_2215681_2022-10-03_06-00-30.car --halt-after-import -c config.toml

Change	Import Time	Total Time
baseline (no change)	~1367s	~3120s
set `optimize_filters_for_hits` to `true`	~1264s	~3077s
set `compaction_style` to `None` and `optimize_for_point_lookup` to `256`	~512s	~1644s*
set `compaction_style` to `None` and `optimize_for_point_lookup` to `256` and `write_buffer_size` to `64`	~478s	~2616s
set `optimize_filters_for_hits` to `true` and `optimize_for_point_lookup` to `8`	~1307s	~2776s

*process reached 1.77GB, compared to 1.33GB for baseline, so it's definitely a tradeoff

Oct 14 '22 10:10 elmattic

Performance degrades quite a lot with the number of .sst when compaction is disabled. Forcing 64MB memtable will result in 4x more files in L0 and emulate a bigger DB over time.

There are some worst case scenarii to consider (ie malicious payload that will ask for very old keys forcing to iterate over all .sst bloom filters).

Oct 17 '22 09:10 elmattic

More results:

baseline is the defaults in PR #1998.

Change	Import Time	Total Time
droplet (baseline)	2739s	6813s
lemmih (baseline)	5124s	6678s
hubert (baseline)	928s	1451s
droplet (no compaction)	1601s	3326s
lemmih (no compaction)	2500s	2764s
hubert (no compaction)	393s	574s

Settings for no compaction:

[rocks_db]
max_open_files            = -1
compression_type          = "none"
compaction_style          = "none"
optimize_for_point_lookup = 256

Oct 18 '22 09:10 lemmih

forest forest copied to clipboard

Investigate the effect of different RocksDB options.

forest
forest copied to clipboard