Faster decompression.
I made the following changes to effectively boost the decompression throughput (~40% over Silesia Corpus)
- faster data copy for LZ match, as given in FAST_LZ_COPY()
- faster Huffman decoding, as defined by HUFF_DECODE_LEN(), HUFF_DECODE_DIST(), using the redefined Huffman decoding tables.
- Break up the long sequence of if...else if...else if... inside the for loop in infate_fast().
- Buffer read in size_t, as opposed to byte-wise
- Expanded boundary, left>=258 -> left>=8, for applying inflate_fast()
Need to do: Validate 32-bit CPU Validate infback() Add abundant comments in the final stage
Dear Maxpaj,
Would you mind explaining what in your view is bogus? Did you test the code performance?
It might be a good idea to split your PR into multiple functional commits explaining each change in detail. This allows each change to be performance tested individually.
I appreciate your constructive comments. The changes are intermingled, so it is difficult to split them into multiple commits in a clear-cut way. How about I amend a document to contrast each of the critical changes and explain the underlying reasons?
I am waiting for the proposer's to-do list to be completed, and I will then test the commit for speed and inspect for correctness and portability. I will then consider whether I should merge it or some rewrite of it.I do not require that this be split into multiple commits.
I have created inflateBackWrap() to validate inflateBack(). Moreover, inflateBackWrap() exhibits identical functionality as inflate(). This allows for an apple-to-apple comparison of performance between inflate() and inflateBack() under different setups. I have also added detailed comments inside the code to explain the underlying rationale.
@madler, could you take over and review the code? Feel free to make changes in your manner.
Hi @madler, may I ask what the current status of this work.
we'd like to enable Intel DEFLATE IAA/QAT in-kernel support for EROFS, but we'd also like to have a faster DEFLATE software decompression fallback support anyway.
I'm not sure if zlib official codebase could address that (otherwise zlib-ng codebase has to be considered instead) since DEFLATE-family is currently a de-facto standard for various common use cases/formats, and better optimized (de)compression for those modern processors could benefit all of us.
And I guess old 16-bit platforms are unimportant now since they could still be supported by old stable zlib versions.
This is on my to-do list.