horaedb
horaedb copied to clipboard
Tracking Issue: improve compaction
Describe This Problem
Now the design of compaction in ceresdb is still so rough, we should make more efforts in it. There are several improvements that can be made in the following areas:
- Compaction strategy. Now we just impl TWSC actually, we define a level 1 but do nothing special for it.
- How to do compaction more efficiently. Speed of compaction may important equally important to strategy.
- Metrics and tests. We should have ways to check the correctness and effectiveness(especially in query improvement) about our compaction strategy.
Proposal
1. Compaction strategy
- [ ] Introduce
score mechanism
to integrate multiple rules. - [x] Consider sequence(wal) when picking compacting files to ensure the correctness.
- [ ] Eliminate time range overlap of ssts in level 1.
- [ ] Take priority of respective table in consideration.
2. Performace of compaction
- [ ] Keep more data in memtable and larger L0 flushed sst. #1029
- [x] Optimize sst iterator and filter build to consume less CPU. #975
3. Metrics and tests
- [ ] Emulator for compaction strategy inspired by iox
- [x] Add metrics (like
read amplification
,write amplification
,space amplification
) to check the effectiveness of the strategy.
Additional Context
No response
Add metrics (like read amplification, write amplification, space amplification) to check the effectiveness of the strategy.
Current codebase already have basic metrics for compact:
- Input sst size/row num
- Output sst size/row num
https://github.com/CeresDB/ceresdb/blob/f873980175e46eb436fb316cabaa6911985794ef/analytic_engine/src/table/metrics.rs#L62