horaedb icon indicating copy to clipboard operation
horaedb copied to clipboard

Use dictionary to store string in memtable

Open Rachelint opened this issue 1 year ago • 0 comments

Describe This Problem

When I run tsbs with the default memtable size(30M), I found the flushed sst is too small(1.9M). As inspecting the parquet file, the high compression ratio is mainly due to storing string in dictionary way. I think we can use dictionary to store string in memtable for keeping more data in memory, and lead to the larger flushed sst.

Proposal

  • Use dictionary to store string in memtable(I plan to do the poc work first).
  • Maybe we should do some statistics with the exist sampling memtable to decide which column should be store in dictionary way(store the high cardinality string columns in this way may waste more space in contrast).
  • Suggested by @tanruixiang, we can keep the mutable memtables in original way, and compress the strings in dictionary way when switching them to immutables.

Additional Context

No response

Rachelint avatar Jun 26 '23 03:06 Rachelint