zstd icon indicating copy to clipboard operation
zstd copied to clipboard

Optimal huff depth speed improvements

Open daniellerozenblit opened this issue 1 year ago • 4 comments

TLDR

This PR is a followup to a previous PR that, for high compression levels, brute force tests all valid Huffman table depths and chooses the optimal based on minimizing encoded + header size. This PR introduces some speed optimizations that could (potentially) allow this feature to be used at lower compression levels.

Note: This does cause us to lose a bit in terms of compression ratio, but offers a significant speed improvement. This could mean that we want more fine-grained depth modes, but that could be overkill.

Benchmarking

I benchmarked on an Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz machine with core isolation and turbo disabled. I measured compression time and compression ratio for silesia.tar (212M), compiling with clang15. I experimented with various combinations of compression level and chunk size. I ran each scenario 5 times and chose the maximum speed value.

Speed-Optimized Optimal Log vs. No Optimal Log

In the following tables, ctrl uses the original huffman log method with no speed or compression optimizations, and test uses the speed-optimized huffman log method.

Default block size

Screen Shot 2022-10-27 at 4 44 42 PM

-B1KB

Screen Shot 2022-10-27 at 4 45 05 PM

-B16KB

Screen Shot 2022-10-27 at 4 45 25 PM

Speed-Optimized Optimal Log vs. Brute Force Optimal Log

In the following tables, ctrl uses the brute force optimal log method, and test uses the speed-optimized optimal log method.

Default block size

Screen Shot 2022-10-27 at 10 37 30 PM

-B1KB

Screen Shot 2022-10-27 at 10 39 48 PM

-B16KB

Screen Shot 2022-10-27 at 10 38 29 PM

daniellerozenblit avatar Oct 27 '22 18:10 daniellerozenblit