zlib icon indicating copy to clipboard operation
zlib copied to clipboard

Faster decompression.

Open icodywu opened this issue 2 years ago • 10 comments

I made the following changes to effectively boost the decompression throughput (~40% over Silesia Corpus)

  1. faster data copy for LZ match, as given in FAST_LZ_COPY()
  2. faster Huffman decoding, as defined by HUFF_DECODE_LEN(), HUFF_DECODE_DIST(), using the redefined Huffman decoding tables.
  3. Break up the long sequence of if...else if...else if... inside the for loop in infate_fast().
  4. Buffer read in size_t, as opposed to byte-wise
  5. Expanded boundary, left>=258 -> left>=8, for applying inflate_fast()

Need to do: Validate 32-bit CPU Validate infback() Add abundant comments in the final stage

icodywu avatar Mar 28 '23 17:03 icodywu

Dear Maxpaj,

Would you mind explaining what in your view is bogus? Did you test the code performance?

icodywu avatar Apr 11 '23 22:04 icodywu

It might be a good idea to split your PR into multiple functional commits explaining each change in detail. This allows each change to be performance tested individually.

nmoinvaz avatar Apr 11 '23 22:04 nmoinvaz

I appreciate your constructive comments. The changes are intermingled, so it is difficult to split them into multiple commits in a clear-cut way. How about I amend a document to contrast each of the critical changes and explain the underlying reasons?

icodywu avatar Apr 12 '23 07:04 icodywu

I am waiting for the proposer's to-do list to be completed, and I will then test the commit for speed and inspect for correctness and portability. I will then consider whether I should merge it or some rewrite of it.I do not require that this be split into multiple commits.

madler avatar Apr 12 '23 08:04 madler

I have created inflateBackWrap() to validate inflateBack(). Moreover, inflateBackWrap() exhibits identical functionality as inflate(). This allows for an apple-to-apple comparison of performance between inflate() and inflateBack() under different setups. I have also added detailed comments inside the code to explain the underlying rationale.

@madler, could you take over and review the code? Feel free to make changes in your manner.

icodywu avatar May 03 '23 17:05 icodywu

Hi @madler, may I ask what the current status of this work.

we'd like to enable Intel DEFLATE IAA/QAT in-kernel support for EROFS, but we'd also like to have a faster DEFLATE software decompression fallback support anyway.

I'm not sure if zlib official codebase could address that (otherwise zlib-ng codebase has to be considered instead) since DEFLATE-family is currently a de-facto standard for various common use cases/formats, and better optimized (de)compression for those modern processors could benefit all of us.

And I guess old 16-bit platforms are unimportant now since they could still be supported by old stable zlib versions.

hsiangkao avatar Sep 04 '23 03:09 hsiangkao

This is on my to-do list.

madler avatar Sep 04 '23 03:09 madler