zstd icon indicating copy to clipboard operation
zstd copied to clipboard

ZSTDm (special mode of zstd optimized for memory compression, see paper inside)

Open ahydronous opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Please describe. Zstd is currently used for Zram/Zswap compression as the default in a lot of places. However, it has quite a big latency penalty compared to lz4, and many applications care about memory latency.

Describe the solution you'd like In 2017, at the Korean College of Information and Communication Engineering, a few researcher created LZ4m, a specialized version of lz4 that was optimized for memory structures. Perhaps zstdm could be created, based on their initial research?

Here is the paper: http://csl.snu.ac.kr/papers/icce17.pdf (LZ4m: A Fast Compression Algorithm for In-Memory Data)

I tried contacting the researchers for the source, but I got 'e-mail not delivered'.

Here is the difference between LZ4 and LZ4m: image

And a few excerpts from the paper:

In-memory data consists of virtual memory pages that contains the data from the stack and the heap regions of applications. The stack and the heap regions contain constants, variables, pointers, and arrays of basic types which are usually structured and aligned to the word boundary of the system [6]. Based on the observation, we can expect that data compression can be accelerated without a significant loss of compressing opportunity by scanning and matching data in the word granularity. In addition, as the larger granularity requires less bits to represent the length and the offset, the substrings (i.e.,literals and matches) can be encoded in a more concise form.

The stack and the heap regions contain constants, variables, pointers, and arrays of basic types which are usually structured and aligned to the word boundary of the system [6]. Based on the observation, we can expect that data compression can be accelerated without a significant loss of compressing opportunity by scanning and matching data in the word granularity. In addition, as the larger granularity requires less bits to represent the length and the offset, the substrings (i.e., literals and matches) can be encoded in a more concise form.

LZ4m uses the same scanning window and hash table of the original LZ4. In contrast to the original LZ4 algorithm, LZ4m scans an input stream and finds the match in a 4-byte granularity. If the hash table indicates no prefix match exists, LZ4m advances the window by 4 bytes and repeats identifying the prefix match. As shown in the Figure 2(a), after examining the prefix at position 8, the next position of the window is 12 instead of 9. If a prefix match is found, LZ4m compares subsequent data and finds the longest match in the 4-byte granularity as well. In the example, although there is a 6-byte match starting from position 12, LZ4m only considers the 4-byte match from 12 to 15, and slides the scanning window forward by four bytes, to position 16.

We also optimize the encoding scheme to utilize the 4-byte granularity. As the offset is aligned to the 4-byte boundary and the length is a multiple of 4 bytes, two least significant bits of these fields are always 00(2). Thus, every field representing the length and the offset (the literal length and the match length in the token, and the literal length, the match length, and the match offset in the body) can be shortened by 2 bits. Moreover, as LZ4m is targeting to compress 4 KB pages in a 4-byte granularity, the field for the match offset requires at most 10 bits. Consequently, we place the two most significant bits of the match offset in the token and put the remaining 8 bits in the body.

And here are the performance gains, LZ4m also achieves slightly higher compression ratio: image

Perhaps that something similar could be achieved for Zstd, except with gains in compression/decompression speed and latency.

ahydronous avatar Sep 05 '24 16:09 ahydronous