horaedb icon indicating copy to clipboard operation
horaedb copied to clipboard

Tracking Issue: improve compaction

Open Rachelint opened this issue 1 year ago • 1 comments

Describe This Problem

Now the design of compaction in ceresdb is still so rough, we should make more efforts in it. There are several improvements that can be made in the following areas:

  • Compaction strategy. Now we just impl TWSC actually, we define a level 1 but do nothing special for it.
  • How to do compaction more efficiently. Speed of compaction may important equally important to strategy.
  • Metrics and tests. We should have ways to check the correctness and effectiveness(especially in query improvement) about our compaction strategy.

Proposal

1. Compaction strategy

  • [ ] Introduce score mechanism to integrate multiple rules.
  • [x] Consider sequence(wal) when picking compacting files to ensure the correctness.
  • [ ] Eliminate time range overlap of ssts in level 1.
  • [ ] Take priority of respective table in consideration.

2. Performace of compaction

  • [ ] Keep more data in memtable and larger L0 flushed sst. #1029
  • [x] Optimize sst iterator and filter build to consume less CPU. #975

3. Metrics and tests

  • [ ] Emulator for compaction strategy inspired by iox
  • [x] Add metrics (like read amplification, write amplification, space amplification) to check the effectiveness of the strategy.

Additional Context

No response

Rachelint avatar Jun 12 '23 02:06 Rachelint

Add metrics (like read amplification, write amplification, space amplification) to check the effectiveness of the strategy.

Current codebase already have basic metrics for compact:

  1. Input sst size/row num
  2. Output sst size/row num

https://github.com/CeresDB/ceresdb/blob/f873980175e46eb436fb316cabaa6911985794ef/analytic_engine/src/table/metrics.rs#L62

jiacai2050 avatar Jun 19 '23 04:06 jiacai2050