horaedb
horaedb copied to clipboard
Use dictionary to store string in memtable
Describe This Problem
When I run tsbs with the default memtable size(30M), I found the flushed sst is too small(1.9M). As inspecting the parquet file, the high compression ratio is mainly due to storing string in dictionary way. I think we can use dictionary to store string in memtable for keeping more data in memory, and lead to the larger flushed sst.
Proposal
- Use dictionary to store string in memtable(I plan to do the poc work first).
- Maybe we should do some statistics with the exist sampling memtable to decide which column should be store in dictionary way(store the high cardinality string columns in this way may waste more space in contrast).
- Suggested by @tanruixiang, we can keep the mutable memtables in original way, and compress the strings in dictionary way when switching them to immutables.
Additional Context
No response