openexr icon indicating copy to clipboard operation
openexr copied to clipboard

Investigate additional compression: Zstd, LERC, webp, Rice, ZFP

Open lgritz opened this issue 3 years ago • 13 comments

libtiff 4.0 added support for zstd compression (lossless) and webp (lossy), and the newly-released 4.3 adds LERC compression (lossy, error-bounded). I've also heard that the astronomical imaging community have found "Rice" compression to be very good. Their data (FITS files) being floating point where lossless is important may indicate that their experience is also applicable to our typical data. ZFP is another format used for scientific data.

We should investigate whether any of these would be beneficial to add to OpenEXR, either for compression ratio or compress/decompress performance reasons. Most of the comparisons available (such as the libtiff 4.0.10 release notes with zlib and webp benchmarks and also LERC benchmarks) are for integer images; we should not assume their performance on typical OpenEXR half images without thorough testing.

References:

  • https://github.com/facebook/zstd
  • https://github.com/Esri/lerc
  • https://awesomeopensource.com/project/Esri/lerc?categoryPage=41
  • https://developers.google.com/speed/webp/
  • https://www.webmproject.org/code/#libwebp-webp-image-library
  • https://github.com/LLNL/zfp

This is a placeholder so we remember this topic and so that newcomers see that it's a potentially valuable project to take on. I'm not volunteering to implement it myself. Also note that each of these compression methods can be investigated or implemented separately and in isolation; it is not necessary for one person to do it all.

lgritz avatar Apr 24 '21 21:04 lgritz

Have you got any links for the Rice discussions you mentioned?

meshula avatar Apr 25 '21 21:04 meshula

No, I just googled it. I don't know the canonical reference.

lgritz avatar Apr 26 '21 04:04 lgritz

Gotcha. This looks like a good reference for our purposes: https://arxiv.org/abs/0903.2140

It seems like they use coarse quantization to represent floats; basically scaling to an integer range and truncating.

The Rice algorithm appears to be an RLE scheme, with some thresholded "noise" gates that are used to get longer runs. Apparently it's used in space probes due to its ease of implementation with simple integer ops.

The article is a good read to learn what data characteristics are interesting to the Astro folks.

meshula avatar Apr 26 '21 06:04 meshula

FWIW, the reference Rice implementation for FITS can be found in the cfitsio library. Talking of space images, there also seems to be this "recommended standard" coding variant used...

kmilos avatar Jun 25 '21 11:06 kmilos

Another possibly interesting one: ZFP -- has both exact and approximate modes for floats. Looks like tinyexr already has an option to use it.

aras-p avatar Jul 21 '21 13:07 aras-p

Thanks, @aras-p , I've added that to the original description. ZFP is one that had also come to my attention before, but somehow slipped my mind when I entered this issue.

lgritz avatar Jul 21 '21 16:07 lgritz

so that newcomers see that it's a potentially valuable project to take on. I'm not volunteering to implement it myself

I might want to play around with zstd & ZFP during vacation soon, by the way! 🤞 I will actually do it :)

aras-p avatar Jul 22 '21 08:07 aras-p

Looked into Zstandard. Summary: looks really good.

  • Compression ratio very similar to Zip or PIZ (all at around 2.4x in my test data set)
  • Writing: Zstd 837, Uncompressed 422, Zip 213 (451 with level 4, see #1125), PIZ 601 MB/s.
  • Reading: Zstd 2012, Uncompressed 1737, Zip 1581 (1895 with level 4), PIZ 1189 MB/s.

More details with interactive graphs at https://aras-p.info/blog/2021/08/06/EXR-Zstandard-compression/

Screenshot 2021-08-06 at 13 20 14

aras-p avatar Aug 06 '21 10:08 aras-p

Btw, libdeflate might provide some throughput benefit over zlib as well and might be on par w/ zstd...

kmilos avatar Aug 06 '21 15:08 kmilos

Btw, libdeflate might provide some throughput benefit over zlib as well

Thanks for the pointer! Tried libdeflate, and indeed it's quite excellent. Does not reach Zstd writing speeds, but a good improvement compared to zlib (and a huge advantage in that the compression algorithm stays the same).

https://aras-p.info/blog/2021/08/09/EXR-libdeflate-is-great/

Screenshot 2021-08-09 at 11 23 17

aras-p avatar Aug 09 '21 08:08 aras-p

Looked into ZFP lossless compression. It's... not great. Both compression ratio & compression/decompression performance are well below other schemes like Zip, PIZ, Zstandard. It might be worth it when doing lossy compression, but I haven't looked into that yet.

More details in this short blog post: https://aras-p.info/blog/2021/08/27/EXR-Filtering-and-ZFP/

aras-p avatar Aug 27 '21 12:08 aras-p

@aras-p Where could I find the images referenced at https://aras-p.info/blog/2021/08/04/EXR-Lossless-Compression/ ? Also, can you share the scripts/code you used for generating the performance figures?

palemieux avatar Feb 16 '24 01:02 palemieux

@palemieux I haven't put it anywhere "properly" in a clean way, but I think it's this messy repository: https://github.com/aras-p/image-formats-testbed-hack

  • the ~20 files being used are spelled out starting at https://github.com/aras-p/image-formats-testbed-hack/blob/master/src/main.cpp#L421 (they are in the repo itself as Git LFS files)
  • the graph generation code is in the same file, generating html files that use Google Chart APIs.
  • it's all a mess though!

aras-p avatar Feb 16 '24 08:02 aras-p