cosmos-db icon indicating copy to clipboard operation
cosmos-db copied to clipboard

tune default rocksdb options

Open yihuang opened this issue 2 years ago • 2 comments

Current Default

  • target_file_size_multiplier = 1
  • block_size = 4096
  • OptimizeLevelStyleCompaction(512M) implies
    • target_file_size_base = 64M
    • snappy/lz4 compression types

Problem

  • sst file size cap at 64M, big number of files
  • small block_size leads to bigger index/filter block and less efficient compression
  • could use more compression at lower levels, zstd with preset dictionary is pretty good according to our tests.

Tuning For DB Size

  • Increase sst file sizes of lower levels, 300M+, set target_file_size_multiplier = 2?
  • Increase block_size to 32k.
  • Use higher compression at lower levels, zstd with preset dictionary.

We manage to reduce a testnet node's application.db from 256G -> 174G by doing a manual compaction with new parameters.

Other Options

  • optimize_filters_for_hits = 1
  • level_compaction_dynamic_level_bytes = true
  • format_version = 4 more efficient index/filter block: http://rocksdb.org/blog/2019/03/08/format-version-4.html
  • format_version = 5, optimize_filters_for_memory=true and jemalloc, more efficient bloom filter.
  • ribbon filter seems a good trade off, saving memory and disk space.

MemTable Optimizations

  • memtable_whole_key_filtering
  • memtable_prefix_bloom_size_ratio

yihuang avatar Dec 18 '22 08:12 yihuang

@yihuang would you be okay with us implementing this using either speedb or pebble?

Reasoning: pebble is non-cgo and speedb simply outperforms rocksdb

faddat avatar Dec 28 '22 14:12 faddat

@yihuang would you be okay with us implementing this using either speedb or pebble?

Reasoning: pebble is non-cgo and speedb simply outperforms rocksdb

speedb is forked from rocksdb v7, so it seems provides all the features of rocksdb itself, but do they plan to sync with future rocksdb updates? pebble misses some features compared with rocksdb that we may use, FIFO compaction policy could be used with the new node key format of IAVL, user-defined-timestamp for the versiondb implementation^1.

yihuang avatar Dec 30 '22 00:12 yihuang