zstd icon indicating copy to clipboard operation
zstd copied to clipboard

pzstd decompression needs 10x more memory than equivalent zstd -d

Open nh2 opened this issue 1 year ago • 1 comments

I would expect that pzstd -d on N threads would require about equal amounts of maxresident memory as N zstd -d instances. But it needs up to 10x more than that.

To Reproduce

# Create compressed file (this is a NixOS package build of `chromium`)
curl https://cache.nixos.org/nar/0h3djg2z32ihzwahsn386lir47p260ns706f29y05m8z5ax3a00v.nar.xz | xz -d > myfile
pzstd -19 --keep myfile -o myfile.pzstd

# Check decompression memory usage
command time pzstd -d -p 6 myfile.pzstd -o /dev/null  # prints 400 MB maxresident
command time zstd  -d      myfile.pzstd -o /dev/null  # prints  10 MB maxresident

Why is the memory usage 40x higher even though I use only 6 threads?

This seems wrong. It happens even when a regular file is used (-o myout instead of -o /dev/null).

Expected behavior

Based on how the funcitoning of pzstd is described (here), pzstd -d -p 6 should at max require 6 decompression contexts and buffers.

I could imagine that if pzstd -d --stdout ... > /dev/null was used, output memory from the different threads might need to be buffered for serialisation into the pipe. But that should not be necessary when -o myout is used, in which case the file can be pre-allocated and different threads can write different outputs independently.

Environment

  • versions: NixOS Linux 22.11 with zstd 1.5.2.
  • machine: A 6-core/12-threads Intel Xeon E5-1650 v3

I found this as part of the discussion on whether the NixOS binary package cache should be switched to zstd compression.

nh2 avatar Dec 19 '22 17:12 nh2

@nh2 If the memory usage of pzstd is too high, then you can always use zstd to decompress it, they are fully compatible. Zstd decompression is already very fast, and often the extra speed provided by pzstd isn't necessary.

If you don't need decompression parallelism, I'd recommend using zstd -T instead of pzstd.

At level 19 zstd uses a window size of 8MB. Pzstd then chooses a job size of 4 * 8MB = 32MB. Each job needs an input buffer and an output buffer. If the data isn't very compressible the input buffers will also be ~32MB.

Pzstd allocates threads + 1 job buffers, so that it can be filling the next job buffer while the threads are working. So to calculate the amount of memory we need: (num-threads + 1) * (window-size * 4) * (1 + 1 / compression-ratio).

E.g. if num-threads=6, window-size=8MB, compression-ratio=1.2, then we need 7 * 32MB * 1.83 = 409 MB.

terrelln avatar Dec 19 '22 22:12 terrelln

Please re-open if you have further questions.

terrelln avatar Dec 22 '22 01:12 terrelln