qoi2-bikeshed icon indicating copy to clipboard operation
qoi2-bikeshed copied to clipboard

Zstandard Compression

Open sudoBash418 opened this issue 2 years ago • 17 comments

I imagine I'm not the first one to look into this, but I haven't seen another post about this yet so I may as well throw the idea onto the playing field. Apologizes if this is the wrong place for this :sweat_smile:

On screenshots, I got anywhere from 15% to sometimes 50% on zstd levels 1-3, generally outperforming even fairly well-optimized PNGs in compression ratio. For other image types it seems to be around 10-15%.

Here's a more thorough comparison of the en.wikipedia.org.png screenshot from the original "test bench":

Compression Size Ratio
original file 1,316,655 1.000
oxipng -D -o3 1,046,134 0.794
qoi 1,498,761 1.138
qoi + zstd -1 955,221 0.725
qoi + zstd -3 809,694 0.615

Tested with the master branch of QOI (2ee2169), zstd v1.5.0, and oxipng 5.0.1.

Haven't done a complete comparison across the entire test bench yet; if there's interest I might get around to it though.

sudoBash418 avatar Dec 11 '21 19:12 sudoBash418

It's interesting that qoi is a pure pixel engine especially as it only touches input once so can be easily streamed. Being modular allows for all the niceties of the existing ecosystem like being wrapped in a tarball for archival or compressed with whatever generic streaming compressor you like (compare that to PNG which chose a fixed compressor, a mediocre one even at the time).

When all is said and done I think it would be ideal if .qoi rivalled libpng on filesize with much faster encode/decode, .qoi.zstd (with some fast setting) rivalled optimised PNG's for filesize, and .qoi.xz (with some slow setting) rivalled JpegXL for lossless filesize. Zstd fast is important to bench when optimising for access times (amusingly, compressing with zstd fast may be quicker than a raw qoi file depending on storage medium), xz slow is important when optimising for archival.

Which is a rambling way of saying yes I am very interested in compressed benchmarks. You might want to hold off on doing so until the format has been finalised (or create the benchmark in such a way that it's easy to run again if the spec changes).

chocolate42 avatar Dec 12 '21 10:12 chocolate42

https://github.com/phoboslab/qoi/issues/71#issuecomment-991884414 has some numbers on wrapping with LZ4 instead of Zstd, on the entire test suite. LZ4 seems closer in spirit to QOI (byte-aligned ops, relatively simple format, emphasize speed over compression ratios, etc).

nigeltao avatar Dec 12 '21 23:12 nigeltao

Apologizes if this is the wrong place for this

This is the right place!

nigeltao avatar Dec 12 '21 23:12 nigeltao

phoboslab/qoi#71 (comment) has some numbers on wrapping with LZ4 instead of Zstd, on the entire test suite. LZ4 seems closer in spirit to QOI (byte-aligned ops, relatively simple format, emphasize speed over compression ratios, etc).

I tested LZ4 once or twice on a couple images but gave up because I thought that zstd had it beat no matter what speed/size you aim for, but I'll add it to my comparison. Looking at the numbers from zstd's own author, it seems that lz4 wins in at least decompression time, so its benefits likely will not be revealed by my script.

It's interesting that qoi is a pure pixel engine especially as it only touches input once so can be easily streamed. Being modular allows for all the niceties of the existing ecosystem like being wrapped in a tarball for archival or compressed with whatever generic streaming compressor you like (compare that to PNG which chose a fixed compressor, a mediocre one even at the time).

When all is said and done I think it would be ideal if .qoi rivalled libpng on filesize with much faster encode/decode, .qoi.zstd (with some fast setting) rivalled optimised PNG's for filesize, and .qoi.xz (with some slow setting) rivalled JpegXL for lossless filesize. Zstd fast is important to bench when optimising for access times (amusingly, compressing with zstd fast may be quicker than a raw qoi file depending on storage medium), xz slow is important when optimising for archival.

Yeah I agree, and that reminds me that I should add LZMA to the test as well, in one form or another.

Which is a rambling way of saying yes I am very interested in compressed benchmarks. You might want to hold off on doing so until the format has been finalised (or create the benchmark in such a way that it's easy to run again if the spec changes).

I'm going to try writing a shell script for testing all this; it'll be filesizes-only however because getting accurate numbers for speed is a much more difficult task.

sudoBash418 avatar Dec 13 '21 02:12 sudoBash418

Here are my results, using qoiconv from phoboslab/qoi@2ee2169 (master). Compressors are sorted by total filesize, descending.

Compression Results (kB)
TOTAL icon_512 icon_64 photo_kodak photo_tecnick photo_wikipedia pngimg screenshot_game screenshot_web textures_photo textures_pk textures_pk01 textures_pk02 textures_plants
plain 1351894 18658 1002 16502 258843 105507 274221 328421 37984 40590 77559 20646 115290 56671
lz4fast12 1312141 14902 990 16495 258744 105508 262631 308866 34795 40590 77163 20066 114830 56561
lz4fast5 1299434 13635 963 16487 258681 105508 260110 303348 33330 40572 76492 19828 114042 56438
lz4fast3 1292847 13043 946 16483 258654 105508 258928 300492 32586 40549 76084 19696 113524 56354
zstdfast5 1284166 12005 949 16500 258628 105496 257074 297238 31723 40591 74793 19458 113357 56354
lz4fast1 1283038 12346 917 16477 258613 105500 257277 296477 31586 40495 74993 19520 112615 56222
zstdfast3 1276492 11326 920 16498 258583 105489 255692 294090 30888 40591 73992 19303 112864 56256
lz4_1 1275580 11887 894 16471 258572 105481 256090 293549 30855 40426 73915 19397 111914 56129
zstdfast1 1266374 10649 880 16491 258521 105476 254095 290440 30077 40571 71979 19134 111975 56086
lz4_3 1205685 10281 873 15611 254795 103816 241933 271855 27532 36699 66498 18461 103959 53372
lz4_5 1202112 10178 872 15581 254577 103745 241258 270580 27399 36506 66157 18396 103599 53264
lz4_9 1201317 10143 871 15580 254552 103743 241145 270312 27373 36479 65936 18375 103560 53248
zstd1 1100532 10059 796 14216 225918 92452 220777 252941 26873 31394 65662 16869 94487 48088
zstd3 1074274 9473 763 14052 225600 92230 215909 243789 24851 31249 60772 16476 91715 47395
zstd9 1052353 8704 756 13848 224181 91546 210475 238618 23610 30531 58062 16134 89201 46687
xz1 1019691 8011 727 13270 220192 89650 202900 228139 22883 30471 56269 15524 86384 45271
zstd19 1017895 8225 740 13298 220685 89924 200601 229915 22427 29472 55717 15745 85869 45277
zstdultra22 1017813 8222 740 13298 220685 89924 200531 229908 22426 29472 55717 15745 85869 45276
xz3 1016294 7956 726 13265 219986 89605 201633 227615 22398 30325 55971 15496 86109 45209
7z9 979018 7523 739 12911 213357 86717 193025 219147 21348 28969 54258 14976 82347 43701
xz6 978992 7502 718 12910 213359 86721 193193 219085 21347 28968 54190 14968 82333 43698
xz9 978811 7502 718 12910 213359 86721 193012 219085 21347 28968 54190 14968 82333 43698
Compression Results (%)
TOTAL icon_512 icon_64 photo_kodak photo_tecnick photo_wikipedia pngimg screenshot_game screenshot_web textures_photo textures_pk textures_pk01 textures_pk02 textures_plants
plain 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
lz4fast12 97.06 79.87 98.84 99.96 99.96 100.0 95.77 94.05 91.6 100.0 99.49 97.19 99.6 99.81
lz4fast5 96.12 73.08 96.15 99.91 99.94 100.0 94.85 92.37 87.75 99.96 98.62 96.03 98.92 99.59
lz4fast3 95.63 69.9 94.42 99.89 99.93 100.0 94.42 91.5 85.79 99.9 98.1 95.4 98.47 99.44
zstdfast5 94.99 64.34 94.77 99.99 99.92 99.99 93.75 90.51 83.52 100.0 96.43 94.25 98.32 99.44
lz4fast1 94.91 66.17 91.59 99.85 99.91 99.99 93.82 90.27 83.15 99.77 96.69 94.55 97.68 99.21
zstdfast3 94.42 60.7 91.89 99.98 99.9 99.98 93.24 89.55 81.32 100.0 95.4 93.5 97.9 99.27
lz4_1 94.36 63.71 89.24 99.82 99.9 99.98 93.39 89.38 81.23 99.6 95.3 93.95 97.07 99.04
zstdfast1 93.67 57.08 87.88 99.94 99.88 99.97 92.66 88.44 79.18 99.95 92.81 92.67 97.12 98.97
lz4_3 89.18 55.1 87.14 94.6 98.44 98.4 88.23 82.78 72.48 90.42 85.74 89.42 90.17 94.18
lz4_5 88.92 54.55 87.05 94.42 98.35 98.33 87.98 82.39 72.13 89.94 85.3 89.1 89.86 93.99
lz4_9 88.86 54.36 86.95 94.42 98.34 98.33 87.94 82.31 72.06 89.87 85.01 89.0 89.83 93.96
zstd1 81.41 53.91 79.48 86.15 87.28 87.63 80.51 77.02 70.75 77.34 84.66 81.7 81.96 84.86
zstd3 79.46 50.77 76.17 85.16 87.16 87.42 78.74 74.23 65.42 76.99 78.36 79.8 79.55 83.63
zstd9 77.84 46.65 75.46 83.92 86.61 86.77 76.75 72.66 62.16 75.22 74.86 78.15 77.37 82.38
xz1 75.43 42.94 72.6 80.42 85.07 84.97 73.99 69.47 60.24 75.07 72.55 75.19 74.93 79.88
zstd19 75.29 44.08 73.91 80.59 85.26 85.23 73.15 70.01 59.04 72.61 71.84 76.26 74.48 79.89
zstdultra22 75.29 44.07 73.91 80.59 85.26 85.23 73.13 70.0 59.04 72.61 71.84 76.26 74.48 79.89
xz3 75.18 42.64 72.52 80.39 84.99 84.93 73.53 69.31 58.97 74.71 72.17 75.05 74.69 79.77
7z9 72.42 40.32 73.77 78.24 82.43 82.19 70.39 66.73 56.2 71.37 69.96 72.53 71.43 77.11
xz6 72.42 40.21 71.7 78.24 82.43 82.19 70.45 66.71 56.2 71.37 69.87 72.5 71.41 77.11
xz9 72.4 40.21 71.7 78.24 82.43 82.19 70.39 66.71 56.2 71.37 69.87 72.5 71.41 77.11
Commpressor Details
Name Command Line
zstdfast# zstd --fast=#
zstd# zstd -#
zstd19 zstd -19 -T0
zstdultra22 zstd -22 --ultra -T0
xz# xz -# -c -T0
7z9 7z a -mx9

Versions

zstd: v1.5.0
xz: (XZ Utils) 5.2.5
7z: p7zip Version 17.04

Some quick notes:

  • For photos, zstd --fast seems to be useless, zstd -1 is where compression starts to take effect
  • 7z was worse than useless, probably due to overhead of the 7z format on small files
  • zstd --ultra -22 is nearly equivalent to zstd -19
  • Similarly, xz -9 is nearly equivalent to xz -6
  • zstd -19 beats out xz -1 in filesize

sudoBash418 avatar Dec 13 '21 05:12 sudoBash418

I ran the test suite through zopflipng which appears to have better compression than oxipng (even oxipng -o 6 -Z, where Z is zopflipng). It took ~30 core hours with option -m, there are more exhaustive options but it would have taken an age for minimal gains. Sorry if you're in the process of crunching PNGs, I set it running overnight on a whim. kB=1000

zopflipng -m
images/icon_512:	  8598
images/icon_64:	           723
images/photo_kodak:	 14715
images/photo_tecnick:	209470
images/photo_wikipedia:	 87123
images/pngimg:	        208908
images/screenshot_game:	227834
images/screenshot_web:	 26516
images/textures_photo:	 31315
images/textures_pk:	 41078
images/textures_pk01:	 14504
images/textures_pk02:	 80261
images/textures_plants:	 47995
totals:                 999046

So qoi + the top compressors already beat the top optimised PNGs. Arguably the PNGs could be crunched slightly further, but then so can qoi if we go off the deep end and start using experimental state of the art compressors like cmix.

@sudoBash418 Let me know if you're going to generate lossless JpegXL data. If not I can crunch those numbers.

LZ4 seems closer in spirit to QOI (byte-aligned ops, relatively simple format, emphasize speed over compression ratios, etc).

LZ4 is too quick IMO unless only dealing with in-memory operations (which is a category I hadn't considered). zstd -1 nearly saturates the bandwidth of a typical consumer SSD with a single (modern) CPU core, so it's a better fit for quickly loading assets for example. So rather than replacing zstd -1 with LZ4 it might be best to have three main data points: LZ4, zstd -1, xz -6.

chocolate42 avatar Dec 13 '21 09:12 chocolate42

So qoi + the top compressors already beat the top optimised PNGs. Arguably the PNGs could be crunched slightly further, but then so can qoi if we go off the deep end and start using experimental state of the art compressors like cmix.

@sudoBash418 Let me know if you're going to generate lossless JpegXL data. If not I can crunch those numbers.

I wasn't planning on it; go for it. I might do a more quick-and-dirty test of "uncompressed PNG" compressed with zstd/xz just to see what happens.

LZ4 seems closer in spirit to QOI (byte-aligned ops, relatively simple format, emphasize speed over compression ratios, etc).

LZ4 is too quick IMO unless only dealing with in-memory operations (which is a category I hadn't considered). zstd -1 nearly saturates the bandwidth of a typical consumer SSD with a single (modern) CPU core, so it's a better fit for quickly loading assets for example. So rather than replacing zstd -1 with LZ4 it might be best to have three main data points: LZ4, zstd -1, xz -6.

I would be inclined to agree, but I haven't run the numbers yet so I'm not certain about what the performance would look like. One thing to note about zstd (and probably lz4, but I'm not certain) is that decompression speed and memory requirements generally stay the same even up to -19, which can be very useful in "compress once, decompress many" cases (such as game assets or static web files).

sudoBash418 avatar Dec 13 '21 11:12 sudoBash418

Great point about using -19 instead of -1, that makes a lot more sense for that use case.

chocolate42 avatar Dec 13 '21 11:12 chocolate42

JPEG XL encoder v0.7.0 335f8a8 [AVX2,SSE4,SSSE3,Scalar] cjxl -e 9 -q 100

kB=1000
images/icon_512:           4934
images/icon_64:             536
images/photo_kodak:       10169
images/photo_tecnick:    151300
images/photo_wikipedia:   65934
images/pngimg:           135264
images/screenshot_game:  163189
images/screenshot_web:    15254
images/textures_photo:    19797
images/textures_pk:       38466
images/textures_pk01:     12085
images/textures_pk02:     63698
images/textures_plants:   29439
totals:                  710071

Lossless JpegXL lives up to the hype.

chocolate42 avatar Dec 13 '21 21:12 chocolate42

While this is indeed very cool, would the specification actually require anything different?

EDIT: unless by writing a parser that does both at once, you can get more efficient code or something? I can definitely see that happening actually

magnus-ISU avatar Dec 31 '21 19:12 magnus-ISU

Putting compression into the qoi2 format (and specifically leaving the magic identifier and the rest of the header uncompressed), instead of flinging around foo.qoi2.xz files around, makes it easier to tell (e.g. as part of the /usr/bin/file command) that a foo.dat file is an image (and specifically a qoi2 image of a certain width and height), instead of only knowing that it's "xz compressed data".

nigeltao avatar Jan 01 '22 02:01 nigeltao

There is some benefit to integrating entropy coding but it's a fair bit of complexity to do it. Refactoring the encode/decode functions to allow streaming would let either method be just as efficient. As long as it's optional it's fine, the worst thing we could do is enforce a particular entropy-coder which doesn't suit all use cases and quickly dates the format. If integrated entropy coding exists I think it should accept these codecs which should cover most use cases with a "none" escape hatch: None, lz4, zstd, xz/lzma.

The zstd reference implementation appears to be able to output all three, haven't had a chance to try it out yet.

edit: It's the benchmark tool in the zstd repo that can handle all three not the implementation, which makes more sense. Looks about as easy as lz4 to integrate, zstd dev files are even in major Linux repo's so it should be as easy as lz4 at least on Linux.

chocolate42 avatar Jan 02 '22 21:01 chocolate42

I haven't fully digested it yet, but @richgel999 recently blogged about LZ_ADD / LZ_XOR compression which might inspire some ideas for interesting QOI2 experiments.

nigeltao avatar Jan 11 '22 23:01 nigeltao

What I don't like about compression is that something like a lz4 encoder might not be available for all programming languages.

wbd73 avatar Jan 12 '22 15:01 wbd73

It exists for C, python, C#, Java, and javascript, Rust, and Go (and a ton more). Is there a specific language you're worried about?

oscardssmith avatar Jan 12 '22 15:01 oscardssmith

... might not be available for all programming languages.

If something exists for C it can exist for pretty much anything through C bindings. LZ4 is such a fundamental and long-lived algorithm that it's definitely everywhere and has been for decades. Same goes for LZMA and even ZSTD (which is newer but already used in a lot of fundamental things like package managers and the Linux kernel).

chocolate42 avatar Jan 12 '22 19:01 chocolate42

I should have read a bit more about it. Now I've read a bit more about it I see it's not really an issue.

wbd73 avatar Jan 12 '22 20:01 wbd73