mini-lsm icon indicating copy to clipboard operation
mini-lsm copied to clipboard

fix: universal compaction condition

Open skyzh opened this issue 1 year ago • 1 comments

close https://github.com/skyzh/mini-lsm/issues/96

the original condition was not very correct and would produce larger write amplification, so we make it exactly the same as RocksDB in this PR. TODO: fix doc and tests.

skyzh avatar Oct 01 '24 01:10 skyzh

I have another question. In Reduce Sorted Runs parts, RocksDB will compact at most max_merge_width tiers into one tier (docs and example). This also seems to reduce write amplification significantly.

cargo run --bin compaction-simulator tiered --dump-real-id --num-tiers 8 --iterations 200 (parameters same as RocksDB):

this PR

=== Iteration 199 ===
--- After Compaction ---
L1716 (42): [...]
L808 (6): [808, 809, 810, 811, 812, 813]
L787 (10): [787, 788, 789, 790, 791, 792, 793, 794, 795, 796]
L753 (15): [...]
L703 (21): [...]
L634 (28): [...]
L501 (78): [...]
--- Statistics ---
Write Amplification: 1757/200=8.785x
Maximum Space Usage: 242/200=1.210x
Read Amplification: 7x

Full Output: pr.txt

my impl similar to RocksDB:

=== Iteration 199 ===
--- After Flush ---
L753 (1): [753]
L748 (5): [748, 749, 750, 751, 752]
L722 (21): [...]
L653 (28): [...]
L453 (145): [...]
--- Statistics ---
Write Amplification: 753/200=3.765x
Maximum Space Usage: 290/200=1.450x
Read Amplification: 5x

Full Output: self.txt

silver-ymz avatar Oct 01 '24 03:10 silver-ymz

The reason I didn't merge this patch is that I don't fully understand why RocksDB produces that sequence -- I think we got the first few lines correct, but looking at the example,

https://github.com/facebook/rocksdb/wiki/Universal-Style-Compaction-Example

it allows > 8 tiers

1 2 3 4 5 6 7 55   (after flush)
1 1 2 3 4 5 6 7 55   (after flush)
29 55   (after compaction)
1 29 55   (after flush)

before doing the compaction, which gives 29; while in my implementation, it gives 28

Looking at https://github.com/facebook/rocksdb/wiki/universal-compaction,

1 16  (no compaction triggered)
1 1 16  (no compaction triggered)
1 1 1 16  (no compaction triggered)
1 1 1 1 16  => 4 16
1 4 16  (no compaction triggered)
1 1 4 16  (no compaction triggered)
1 1 1 4 16 => 3 4 16
1 3 4 16  (no compaction triggered)
1 1 3 4 16 => 2 3 4 16
1 2 3 4 16  (no compaction triggered)
1 1 2 3 4 16 => 11 16

it also indicates the same -- even if it triggers at tier == 5, the number of tiers can stay at 5 in some conditions.

so I'd like to reproduce their algorithm and check if it reduces write amplification (my intuition is yes b/c they do fewer compactions).

skyzh avatar Nov 12 '24 05:11 skyzh

after looking at it for a while, I assume it's a typo, though I suppose their example should be generated with some tools, but I couldn't figure out a reasonable logic for that 🤪

skyzh avatar Nov 13 '24 02:11 skyzh

I've updated my implementation to take max_merge_with for reducing sorted runs and I'll merge the patch; meanwhile, I'll update the docs in another patch. Thanks for the suggestion anyways :)

skyzh avatar Nov 13 '24 02:11 skyzh

The reason I didn't merge this patch is that I don't fully understand why RocksDB produces that sequence -- I think we got the first few lines correct, but looking at the example,

I also guess it's a typo. I asked it at mail channel, but no response.

silver-ymz avatar Nov 13 '24 03:11 silver-ymz