zlib icon indicating copy to clipboard operation
zlib copied to clipboard

A trick to compress faster

Open photopea opened this issue 7 years ago • 3 comments

Hi. There is a trick, which requires additional 64kB of RAM, but can make the compression (search for repeating strings in LZ77) much faster.

You hash 3-bytes into chains, which help you find previous occurences of these 3 bytes. But you hash 3 bytes into 8 bits, and many collisions occur (you process very long chains while looking for the longest string match). I suggest to use 16 bits for hashing. There are much less collisions, shorter chains, so you find occurences faster. I suggest a following hash (since you don't use full Rabin-Karp, I guess hashes do not have to be "cyclic"):

hash = ((byte1 |  (byte2<<8)) + (byte3<<4)) & 0xffff;

I made my own implementation of Deflate (not in C) and 16 bit hashes work much better, than 8 bit hashes. I think it could help ZLIB, too, but I am not that comfortable with C to implement it myself :(

photopea avatar May 24 '18 21:05 photopea

Hi,If I change the hash function like you, what effect does it have on the compression ratio?

sunny-shu avatar Mar 11 '20 07:03 sunny-shu

@361442342 This has absolutely no effect on the compression ratio. It only makes the compression much faster (which can be useful, when compressing huge files with Level 9).

photopea avatar Mar 11 '20 08:03 photopea

The default hash bits of zlib is 15 , but can be increased to 16 bits by setting memlevel to 9.

deflateInit(strm,level); // set memlevel = 8 

deflateInit2(strm, level, method, windowBits, 9, strategy); // set memlevel = 9

minchai23 avatar Sep 09 '21 02:09 minchai23