manticoresearch icon indicating copy to clipboard operation
manticoresearch copied to clipboard

FSST compression method

Open AbstractiveNord opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe. Main data type in ManticoreSearch is String. Compression techniques allows to increase hardware utilization, efficient use of memory and disk resources. It's good idea to check, can be FSST technique used for improving ManticoreSearch.

Describe the solution you'd like Test, measure and implement FSST string compression method.

Additional context Less data to read may provide better perfomance, so it's good to test FSST both to row-wise storage and columnar storage.

FSST Repository.

AbstractiveNord avatar Nov 26 '23 12:11 AbstractiveNord

stored strings use docstore that already uses lz4 compression library and you already could use high level of the ocmpression for it as described at manual docstore_compression

tomatolog avatar Nov 26 '23 21:11 tomatolog

stored strings use docstore that already uses lz4 compression library and you already could use high level of the ocmpression for it as described at manual docstore_compression

Authors of FSST says that their method provides better compression ratio and better compression speed.

AbstractiveNord avatar Nov 27 '23 08:11 AbstractiveNord

Some notes after today's dev call:

  • Docstore: may help, but provided we utilize lazy fetching, the effect may be invisible

  • Columnar: can help with compression ratio and encoding (i.e. data write) performance: image

    but can barely help with search performance.

sanikolaev avatar Nov 27 '23 09:11 sanikolaev

Authors of FSST says that their method provides better compression ratio and better compression speed.

From their README.md I see the speed is the same

When compared to e.g. LZ4 (which is block-based), FSST further achieves similar decompression speed and compression speed, and better compression ratio.

The advantages that it could decompress the only strings without touching the whole block. And equality comparisons can be performed without decompressing.

tomatolog avatar Nov 27 '23 09:11 tomatolog