rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

When I use db_paths ops, rocksdb write data to db_paths[0] & db_paths[3] path only

Open xiaobiaozhao opened this issue 1 year ago • 7 comments

rocks options

options_.db_paths = {{"/mnt/rocksdb/a", 1000 * 1000 * 1000},
                       {"/mnt/rocksdb/b", 1000 * 1000 * 1000},
                       {"/mnt/rocksdb/c", 1000 * 1000 * 1000},
                       {"/mnt/rocksdb/d", 1000 * 1000 * 1000}};

After put some data, /mnt/rocksdb/a & /mnt/rocksdb/d will be writed sst file, but /mnt/rocksdb/a & /mnt/rocksdb/d is empty

du -h /mnt/rocksdb/
4.0K    /mnt/rocksdb/b
4.0K    /mnt/rocksdb/c
958M    /mnt/rocksdb/a
944M    /mnt/rocksdb/d
1.9G    /mnt/rocksdb/

Expected behavior

According to the document description https://github.com/facebook/rocksdb/blob/main/include/rocksdb/options.h#L672 I think rocksdb will write sst file to /mnt/rocksdb/a => /mnt/rocksdb/b => /mnt/rocksdb/c => /mnt/rocksdb/d

Actual behavior

rocksdb write sst file /mnt/rocksdb/a => /mnt/rocksdb/d,and /mnt/rocksdb/b & /mnt/rocksdb/c be skipped

Steps to reproduce the behavior

// 1. set db_paths
options_.db_paths = {{"/mnt/rocksdb/a", 1000 * 1000 * 1000},
                       {"/mnt/rocksdb/b", 1000 * 1000 * 1000},
                       {"/mnt/rocksdb/c", 1000 * 1000 * 1000},
                       {"/mnt/rocksdb/d", 1000 * 1000 * 1000}};
// 2. opendb
rocksdb::DB::Open(options_, name, column_families, &cf_handles_, &db_);

// 3. put kv
db_->Put(rocksdb::WriteOptions(), key, value);

xiaobiaozhao avatar Oct 21 '22 02:10 xiaobiaozhao

@xiaobiaozhao Is it possible that all of your data has been compacted into the lowest level? Can you give an example of a complete test and execution showing this issue? I have not had a chance to modify db_bench to show this problem.

mrambacher avatar Oct 25 '22 11:10 mrambacher

@xiaobiaozhao Is it possible that all of your data has been compacted into the lowest level? Can you give an example of a complete test and execution showing this issue? I have not had a chance to modify db_bench to show this problem.

All right, let me prepare a minimal recurrence case

xiaobiaozhao avatar Oct 25 '22 12:10 xiaobiaozhao

@xiaobiaozhao Is it possible that all of your data has been compacted into the lowest level? Can you give an example of a complete test and execution showing this issue? I have not had a chance to modify db_bench to show this problem.

Demo is here https://gist.github.com/xiaobiaozhao/75a0f6d3d3b3f564e28eacd9b85d3c1a

xiaobiaozhao avatar Oct 25 '22 13:10 xiaobiaozhao

Hi, any updates? Maybe in current 8.3.х this bug are fixed?

aleksraiden avatar Jun 27 '23 12:06 aleksraiden

This is rocksdb v8.3.2

du -h /mnt/rocksdb/
471M    /mnt/rocksdb/a
4.0K    /mnt/rocksdb/c
1.5G    /mnt/rocksdb/d
4.0K    /mnt/rocksdb/b

xiaobiaozhao avatar Jun 28 '23 12:06 xiaobiaozhao

@xiaobiaozhao

please check the option: level_compaction_dynamic_level_bytes in rocksdb v8.3.2, the level_compaction_dynamic_level_bytes may be true

如果打开level_compaction_dynamic_level_bytes,则目标层会从默认的Level 1 变成最高层 Level 6,即最开始Level 0会直接compact到Level 6,如果某次compact后,Level 6大小超过256M(target_file_size_base),假设300M,则base_level向上调整,此时base_level变成Level 5,而Level 5的大小上限是300M/10 = 30M,之后Level 0会直接compact到Level 5,如果Level 5超过30M,假设50M,则需要与Level 6进行compact,compact后,Level 5恢复到30M以下,Level 6稍微变大,假设320M,则基于320M继续调整base_level,即Level 5的大小上限,调整为320M/10 = 32M,随着写入持续进行,最终Level 5会超过256M(target_file_size_base),此时base_level需要继续上调,到Level 4,取Level 5和Level 6当前大小较大者,记为MaxSize,则Level 4的大小上限为MaxSize/100,Level 5的大小上限为Level 4大小上限乘以10,依次类推。 相关代码在VersionStorageInfo::CalculateBaseBytes。

cxyxd avatar Mar 07 '24 06:03 cxyxd

@xiaobiaozhao

please check the option: level_compaction_dynamic_level_bytes in rocksdb v8.3.2, the level_compaction_dynamic_level_bytes may be true

如果打开level_compaction_dynamic_level_bytes,则目标层会从默认的Level 1 变成最高层 Level 6,即最开始Level 0会直接compact到Level 6,如果某次compact后,Level 6大小超过256M(target_file_size_base),假设300M,则base_level向上调整,此时base_level变成Level 5,而Level 5的大小上限是300M/10 = 30M,之后Level 0会直接compact到Level 5,如果Level 5超过30M,假设50M,则需要与Level 6进行compact,compact后,Level 5恢复到30M以下,Level 6稍微变大,假设320M,则基于320M继续调整base_level,即Level 5的大小上限,调整为320M/10 = 32M,随着写入持续进行,最终Level 5会超过256M(target_file_size_base),此时base_level需要继续上调,到Level 4,取Level 5和Level 6当前大小较大者,记为MaxSize,则Level 4的大小上限为MaxSize/100,Level 5的大小上限为Level 4大小上限乘以10,依次类推。 相关代码在VersionStorageInfo::CalculateBaseBytes。

I'll test it when I have time.

xiaobiaozhao avatar Mar 11 '24 03:03 xiaobiaozhao