Optimal huff depth speed improvements

Open daniellerozenblit opened this issue 1 year ago • 4 comments

TLDR

This PR is a followup to a previous PR that, for high compression levels, brute force tests all valid Huffman table depths and chooses the optimal based on minimizing encoded + header size. This PR introduces some speed optimizations that could (potentially) allow this feature to be used at lower compression levels.

Note: This does cause us to lose a bit in terms of compression ratio, but offers a significant speed improvement. This could mean that we want more fine-grained depth modes, but that could be overkill.

Benchmarking

I benchmarked on an Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz machine with core isolation and turbo disabled. I measured compression time and compression ratio for silesia.tar (212M), compiling with clang15. I experimented with various combinations of compression level and chunk size. I ran each scenario 5 times and chose the maximum speed value.

Speed-Optimized Optimal Log vs. No Optimal Log

In the following tables, ctrl uses the original huffman log method with no speed or compression optimizations, and test uses the speed-optimized huffman log method.