zstd icon indicating copy to clipboard operation
zstd copied to clipboard

Decompression speed drops when using a shared dictionary on short documents

Open pkolaczk opened this issue 1 year ago • 4 comments

Describe the bug When I use a shared dictionary, the decompression speed significantly drops to the level much below when not using the dictionary, if compressing/decompressing small chunks a few kB each. The difference is bigger with stronger compression levels. The compression speed looks however unaffected. I'm not sure if it is a bug, but this looks quite disappointing considering that your website states that dictionaries should improve performance, especially for small messages.

I triple checked I initialize the DecoderDictionary only once and I reuse it for all blocks. (See attached profile at the end which confirms 95%+ time is spent inside ZSTD decompression)

To Reproduce Steps to reproduce the behavior:

  1. Download a large enough csv file, e.g one of those is fine: https://www.ncei.noaa.gov/data/global-hourly/archive/csv/. I downloaded the pack for 1940.
  2. Unpack it with your favorite tar.gz unpacker.
  3. Install compresto by running: cargo install compresto
  4. Run a series of benchmarks without the dictionary:
% compresto benchmark-many /Users/piotr/Downloads/1940/72511514711.csv -a zstd -b 16384                           
zstd -c -7: 2296693 => 450566 (19.6 %), compression: 443.1 MB/s, decompression: 1196.4 MB/s
zstd -c -6: 2296693 => 421861 (18.4 %), compression: 649.8 MB/s, decompression: 2459.0 MB/s
zstd -c -5: 2296693 => 404779 (17.6 %), compression: 959.6 MB/s, decompression: 2430.7 MB/s
zstd -c -4: 2296693 => 373980 (16.3 %), compression: 931.8 MB/s, decompression: 1692.2 MB/s
zstd -c -3: 2296693 => 327156 (14.2 %), compression: 650.8 MB/s, decompression: 1736.6 MB/s
zstd -c -2: 2296693 => 282709 (12.3 %), compression: 683.7 MB/s, decompression: 1765.6 MB/s
zstd -c -1: 2296693 => 253982 (11.1 %), compression: 688.5 MB/s, decompression: 1851.4 MB/s
zstd -c 1: 2296693 => 194106 (8.5 %), compression: 602.2 MB/s, decompression: 1434.2 MB/s
zstd -c 2: 2296693 => 193427 (8.4 %), compression: 367.8 MB/s, decompression: 1636.6 MB/s
zstd -c 3: 2296693 => 206364 (9.0 %), compression: 231.9 MB/s, decompression: 1636.6 MB/s
zstd -c 4: 2296693 => 206359 (9.0 %), compression: 132.7 MB/s, decompression: 1762.2 MB/s
zstd -c 5: 2296693 => 195890 (8.5 %), compression: 85.5 MB/s, decompression: 1783.8 MB/s
zstd -c 6: 2296693 => 187577 (8.2 %), compression: 83.3 MB/s, decompression: 2185.3 MB/s
zstd -c 7: 2296693 => 186520 (8.1 %), compression: 50.6 MB/s, decompression: 2136.0 MB/s
zstd -c 8: 2296693 => 175292 (7.6 %), compression: 49.5 MB/s, decompression: 2140.4 MB/s
zstd -c 9: 2296693 => 175309 (7.6 %), compression: 90.3 MB/s, decompression: 2240.1 MB/s
zstd -c 10: 2296693 => 175521 (7.6 %), compression: 16.2 MB/s, decompression: 2312.4 MB/s
zstd -c 11: 2296693 => 175826 (7.7 %), compression: 15.5 MB/s, decompression: 2130.3 MB/s
zstd -c 12: 2296693 => 175836 (7.7 %), compression: 7.8 MB/s, decompression: 1928.6 MB/s
  1. Generate the dictionary:
% zstd --train-fastcover /Users/piotr/Downloads/1940/72511514711.csv -o dictionary-csv -B16384
Trying 82 different sets of parameters                                         
k=50                                                                           
d=6
f=20
steps=40
split=75
accel=1
Save dictionary of size 79101 into file dictionary-csv 
  1. Repeat the benchmarks with the dictionary:
% compresto benchmark-many /Users/piotr/Downloads/1940/72511514711.csv -a zstd -b 16384 --dict dictionary-csv      
zstd -c -7: 2296693 => 312131 (13.6 %), compression: 371.4 MB/s, decompression: 861.0 MB/s
zstd -c -6: 2296693 => 299806 (13.1 %), compression: 475.9 MB/s, decompression: 993.2 MB/s
zstd -c -5: 2296693 => 290887 (12.7 %), compression: 505.4 MB/s, decompression: 1012.1 MB/s
zstd -c -4: 2296693 => 280730 (12.2 %), compression: 558.1 MB/s, decompression: 958.6 MB/s
zstd -c -3: 2296693 => 265723 (11.6 %), compression: 542.2 MB/s, decompression: 1158.2 MB/s
zstd -c -2: 2296693 => 256276 (11.2 %), compression: 632.8 MB/s, decompression: 1201.9 MB/s
zstd -c -1: 2296693 => 236373 (10.3 %), compression: 647.9 MB/s, decompression: 1309.7 MB/s
zstd -c 1: 2296693 => 214283 (9.3 %), compression: 639.2 MB/s, decompression: 1297.3 MB/s
zstd -c 2: 2296693 => 215959 (9.4 %), compression: 409.2 MB/s, decompression: 1290.2 MB/s
zstd -c 3: 2296693 => 208883 (9.1 %), compression: 303.0 MB/s, decompression: 820.6 MB/s
zstd -c 4: 2296693 => 208760 (9.1 %), compression: 318.3 MB/s, decompression: 759.1 MB/s
zstd -c 5: 2296693 => 192558 (8.4 %), compression: 195.5 MB/s, decompression: 601.3 MB/s
zstd -c 6: 2296693 => 180981 (7.9 %), compression: 137.6 MB/s, decompression: 555.3 MB/s
zstd -c 7: 2296693 => 165328 (7.2 %), compression: 117.2 MB/s, decompression: 1054.9 MB/s
zstd -c 8: 2296693 => 165212 (7.2 %), compression: 111.7 MB/s, decompression: 961.2 MB/s
zstd -c 9: 2296693 => 165395 (7.2 %), compression: 87.8 MB/s, decompression: 491.0 MB/s
zstd -c 10: 2296693 => 165782 (7.2 %), compression: 69.9 MB/s, decompression: 675.6 MB/s
zstd -c 11: 2296693 => 166528 (7.3 %), compression: 45.8 MB/s, decompression: 711.0 MB/s
zstd -c 12: 2296693 => 166522 (7.3 %), compression: 43.6 MB/s, decompression: 714.8 MB/s

Expected behavior The Zstd website states that running with dictionary improves performance of compression and decompression. I expect the decompression speed is at least the same as without the dictionary.

Desktop

  • OS: macOS
  • Version: Sonoma 14.6.1
  • Compiler: rust 1.82.0, toolchain: stable-aarch64-apple-darwin
  • Flags: --release
  • Other relevant hardware specs: M2 Pro, 32 GB RAM
  • Build system: cargo 1.82.0

Additional context

  • Zstd v1.5.5
  • rust zstd bindings v0.13.2

This is visible only when compressing blocks of small size independently, e.g. 1 kB - 16 kB (use -b setting to change it). The phenomenon smoothly disappears as the block size gets larger.

Profile Here is the profile I captured with Xcode Instruments:

image

pkolaczk avatar Oct 22 '24 15:10 pkolaczk