Zstandard support
Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios.
This PR adds support for searching through zst files using the zstandard Python bindings.
It works and gets the job done for now. Additional things that can be done:
- Merge
localzstdsearch.pyandlocalgzipsearch.pyto reduce code duplication - Add test case(s)
- [Stretch goal] Handle encodings other than
utf-8
Hello,
Thank you for the PR. I don't use this type of compression but seems very promising.
As this is the first PR that adds a feature, I am not quite sure how I will integrate it, so I need to figure some things out. h8mail's design is focused on requiring requests only, and I would like to keep it so.
I think I will have the user manually install the zstandard lib with pip3 (will be documented in the wiki), and check if the lib is installed when using this option.
Before I get to it I have a few things I need to integrate first, but I have read your PR and thinking through it.
Do you convert your local data breaches to zstandard before archiving them? What is your workflow?
Thanks again, much appreciated :+1:
I think I will have the user manually install the zstandard lib with pip3 (will be documented in the wiki), and check if the lib is installed when using this option.
Sounds fair.
Do you convert your local data breaches to zstandard before archiving them? What is your workflow?
Yep, I compress all my local data with zstd. zstd is basically a better alternative to gzip/zlib. It creates smaller archives that decompress much faster. In fact, I get better performance searching over a zstd compressed file than searching over a decompressed file because I get limited by my disk io in both cases.
Thanks for creating h8mail btw! It makes life much easier. I especially love the fact that it supports multi-threading when searching over local data.