openexr Zstd compressor (#3)

This PR adds a new compression algorithm optimized for Deep images. During the development, deep images were shrunk by 20-30% compared to ZIPS with similar compression performance and 35-45% smaller when the compression level is turned to the max (about a 3-4x slowdown compared to ZIPS). The decompression speed slightly faster than ZIPS.

I did not spend a lot of time trying to optimize for non-Deep images, but certain small non-deep example images with HALF pixels yielded a slightly worse compression rate compared to ZIPS.

The core algorithm is implemented in Blosc, with ZSTD codec with an appropriate data prefiltering. The main advantage is that it offers a unified API access to multiple codecs and is under active development. Also the binary serialization stays compatible if ever we want to change the codecs/config parameters in the future, without introducing breaking changes.

Open questions: I still have not figured out how to integrate Blosc properly into the OpenEXR build pipeline.

Dec 21 '23 14:12 clusty

The committers listed above are authorized under a signed CLA.

:white_check_mark: login: pleprince / name: Philippe Leprince (06f30a848cd2e6dab7a846ed6e4992959fea40fc, a06daecb6a906887283bb3fae0ccf2ab0326fa50, 0e90ff164e7da0fc6e739ba50b1e2c13a8deb471, 2ad98fe683ed7b65de824e0d02e6f811d8d9ec00, 48fa372650c3f713c30c101051682d5e767cc5dc, a887716cf281a0a83d408e4980841c4558fd9587, ccac098efda46e4774d28406d34daf90d96090e6, bb8754ad8ee4fed0280b007f57c01adb0090897b, 387d2ca59f77a1ea37b44e227ffb058e9a52d039, 2d678aefc30a68b007a91c4c0f2d0b4605ecf1ae, 6cfc3cc0afa75fa7354112321b215d81b11b87ea, f52266d1ebd06c8f0fe1019ad1b8624c0bfc5d39, 688d0d66634bfd92003ab3d30cecc85007314d5e, 21c60db48fabf0641d65eee0daab7a37f0d5f9c4, 9799bf9d5c5132fdeea94678fd061ae3ba51025e, 05d9457b3b9a372ab101c581e450e2ebc26d0bea, 40b1937993f1dbb413bb85fdd356cc295a56ad4f, f2bdfdea6003cbdff4539cc68841baa159b4e3fe, 88c034843926271dabdb30641b59e2e8d855d9e2, 1d2cd0ca64c93078ea3fb1fc56901e662a7b049f, a9ebf07132dda49c14b376b7737088ec5b2bb2d6, 15844b5fcb3f9af5f5b01d254e83405dc1a63907, 903c2c4b00d3e3b7a453019beb46a58ad3192fa4, 5b0be1dc4b075d27f53146e3d736dabb47174395, 5047f7f8bbb55e2477be898d7155ec13ec7e04a4, cb6a1e07dfadda8322aae81712e45c4022dd4a4b, 2341fb40da1838e79acb40c74a5249a1a701b58f, 55f37b0017d22462a434c5399c1aeb18fb5c0b5d, 81e17011bcae03cd037329c7876ed467975075b2, 4c714a2f30e2015aa362c6db31ad0911b74e018c, 17a3fdfee9e8c84d2c7d95f1eba00f7883bd2ec6)
:white_check_mark: login: clusty / name: Vlad Lazar (41afe2e36e7aa4e366c466c8899f08bb0deb0a90, 78ca6c60e325f48033038f74ce3e8229f424cd1e, d0a24d43e90f099aca7347199c6354d13b1a069c, 1e6698ed3be421b2ec3edf8ba9bbf5a652f7fccb, 06896124aa995f7c39d90094beb0cc10c89b6b43, 057dd80ab65fc03a0df375ddaec8be2d9d79b12e, 574988ee7d43ed764d7facc450a91e3300b19431)

Dec 21 '23 14:12 linux-foundation-easycla[bot]

A belated thanks for this, it mostly looks pretty straightforward, with some caveats.

Obviously, the blosc configuration needs to be sorted out. If you haven't already, you might look at OpenVDB for template for building against blosc: https://github.com/AcademySoftwareFoundation/openvdb/blob/master/CMakeLists.txt#L99 and https://github.com/AcademySoftwareFoundation/openvdb/blob/master/cmake/FindBlosc.cmake

We'll obviously need to carefully consider the consequences and timing of extensions to the file format, as well as library dependencies, and possibly want to bundle this with other changes, but we have already been anticipating adding new compression algorithms, so having a working example will be helpful.

A few pedantic requests:

We require a CLA, can you sign via the red "NOT COVERED" button above?
We also require copyright notices at the top of each file.
In addition to the example code, there should be tests in the test suite in src/test.
We'll need the parallel implementation in the OpenEXRCore library.
I'll leave a few comments inline about minimizing whitespace changes.
Also, the associated docs should be updated as well.

Jan 03 '24 01:01 cary-ilm

I will add the cmake changes shortly. Thanks

Jan 11 '24 08:01 pleprince

@cary-ilm Is there a way to share the implementation of my compressor between OpenEXR and OpenEXRCore ? The 2 are the the same minus some memory management bits.

Jan 11 '24 13:01 clusty

OpenEXR depends on OpenEXRCore (minimally at this point but that may well increase), so if you able to put an implementation in OpenEXRCore that can be used in OpenEXR, that's fine. But obviously you're aware that OpenEXRCore is a C-language library, so no C++ classes there.

Jan 11 '24 17:01 cary-ilm

I see you updated exrheader to dump out the compression type. exrmaketiled, exrenvmap and exrmultiview take a command line string and turn that to a compression type. Should they be extended to support ZSTD compression?

Perhaps it's worth making library functions that turn a string like 'piz' into PIZ_COMPRESSION, and vice versa, to simplify updating these utliities (and end-user code) as new compression types are added.

Jan 11 '24 23:01 peterhillman

Hi @cary-ilm & @peterhillman

I have made a number of cmake modifications based on the Imath lib section. That seem to work but there are a few lines that I copied and pasted that I don't understand and any help would be much appreciated.

  # the install creates this but if we're using the library locally we
  # haven't installed the header files yet, so need to extract those
  # and make a variable for header only usage
  if(NOT TARGET Imath::ImathConfig)
    get_target_property(imathinc Imath INTERFACE_INCLUDE_DIRECTORIES)
    get_target_property(imathconfinc ImathConfig INTERFACE_INCLUDE_DIRECTORIES)
    list(APPEND imathinc ${imathconfinc})
    set(IMATH_HEADER_ONLY_INCLUDE_DIRS ${imathinc})
    message(STATUS "Imath interface dirs ${IMATH_HEADER_ONLY_INCLUDE_DIRS}")
  endif()

my problem here is IMATH_HEADER_ONLY_INCLUDE_DIRS isn't referenced anywhere else in the project. Is that being picked by magic / naming convention ? I didn't find anything in the cmake docs. I am assuming this is adding the Imath API headers to the install, in which case I don't need it.

The other bit I don't understand is:

    # Propagate OpenEXR's setting for pkg-config generation to Imath:
    # If OpenEXR is generating it, the internal Imath should, too.
    set(IMATH_INSTALL_PKG_CONFIG ${OPENEXR_INSTALL_PKG_CONFIG})

What's exactly the point here ? Another bit of magic, I'm sure ! :-)

Thanks in advance

Jan 12 '24 17:01 pleprince

I got this to build OK, so it looks like the Blosc build is working. Did you research different settings for the number of scanlines in each chunk? 32 scanlines of deep image data could get very large. It's also a little more complicated to multi-thread efficiently with multi-scanline compression schemes. I wonder whether there should be a ZSTD1 as well as a ZSTD32? Or perhaps just a ZSTD1?

As it happens, it seems the library doesn't properly support writing multi-scanline compression formats to deep images. I wrote a test tool to read and write deep images with a different compression. It only allocates a single deep scanline at a time, points a DeepFrameBuffer at that, then loops reading the scanline from the input and then calling writePixels(1). It appears LineBufferTask::execute in ImfDeepScanLineOutputFile waits until writePixels() has been called 32 times, then it copies the pixel counts from the FrameBuffer for all 32 scanlines before writing. In my case, I've overwritten the data from the previous 31 scanlines. I had a brief go at fixing it so it saves the pixel counts as soon as writePixels is called, but haven't got that working properly yet. I hacked the ZSTD_COMPRESSION to be single scanline, and that works.

Jan 16 '24 03:01 peterhillman

I got this to build OK, so it looks like the Blosc build is working. Did you research different settings for the number of scanlines in each chunk? 32 scanlines of deep image data could get very large. It's also a little more complicated to multi-thread efficiently with multi-scanline compression schemes. I wonder whether there should be a ZSTD1 as well as a ZSTD32? Or perhaps just a ZSTD1?

The 32 number came from some experiements with some real world data I had. You are probably right that 32 might be large, when the number of samples is large. Will test with 1 and see how much does the compression degrade.

As it happens, it seems the library doesn't properly support writing multi-scanline compression formats to deep images. I wrote a test tool to read and write deep images with a different compression. It only allocates a single deep scanline at a time, points a DeepFrameBuffer at that, then loops reading the scanline from the input and then calling writePixels(1). It appears LineBufferTask::execute in ImfDeepScanLineOutputFile waits until writePixels() has been called 32 times, then it copies the pixel counts from the FrameBuffer for all 32 scanlines before writing. In my case, I've overwritten the data from the previous 31 scanlines. I had a brief go at fixing it so it saves the pixel counts as soon as writePixels is called, but haven't got that working properly yet. I hacked the ZSTD_COMPRESSION to be single scanline, and that works.

Thanks very much for looking at this. We had some discussions about this topic. I was experience spurious deadlocks in LineBufferTask. At some point all the reliability issues had gone away, so I chucked it to my imagination :)

I think this all points to the idea to make ZSTD work only on 1 scanline for now. (might make a 32 scanline version later)

Jan 16 '24 13:01 clusty

Perhaps it's worth making library functions that turn a string like 'piz' into PIZ_COMPRESSION, and vice versa, to simplify updating these utliities (and end-user code) as new compression types are added.

I have worked on that today and will make a separate PR.

Jan 16 '24 19:01 pleprince

I had issues doing cmake ../openexr -DCMAKE_BUILD_TYPE=Debug - it seems that causes blosc to generate a libblosc2_d.a but OpenEXR still tries to link against libblosc2.a. In other news I think I have a version of OpenEXR which fixes deep reading and writing to allow single deep scanlines to be written and read with codecs that compress multiple scanlines together. The current code is using the supplied SampleCount slice to store per-pixel sample counts, so that must be valid for all scanlines of the chunk (and the library will write the sample count to your buffer for all scanlines of the chunk, even if you only ask it to read one scanline). There are no issues with the current implementation, as long as all deep codecs are single scanline only. My changes further enforce that deep images cannot have subsampling other than 1, which means every sample has the same set of channels. Reading is inefficient: I believe reading scanlines consecutively from the same chunk causes the chunk to be decoded every time.

I would suggest that we have a ZSTD single scanline codec, but if decide to add a multiscanline version too, then we will need to make these changes. I'm a little bit timid about pushing my changes unless we need them. If we have multiple versions of ZSTD, then I think they should be in the same release

Jan 17 '24 20:01 peterhillman

I did not spend a lot of time trying to optimize for non-Deep images, but certain small non-deep example images with HALF pixels yielded a slightly worse compression rate compared to ZIPS.

I looked up blosc2 a bit and it seems it turns on byte swizzle by default. This is another empirical optimization chasing after the idea that "exponent might be smooth while mantissa might fluctuate rapidly"; in other words, the higher bytes and lower bytes might exhibit different behaviours inside a float and we should collect them into separate chunks. But the blosc2 implementation probably only works for 4 byte IEEE float. I have a hard time imaging how this would optimize half:

https://www.slideshare.net/PyData/blosc-py-data-2014#17

blosc2shuffle

openEXR's internal zip chased a similar idea, however the existing code swizzled by 2 bytes for an entire block of numbers rather than doing a local 16 byte swizzle on a 4 byte pattern: https://github.com/AcademySoftwareFoundation/openexr/blob/86d0fb09859951d1e51a889e4ff2b7b3baecf021/src/lib/OpenEXR/ImfZip.cpp#L50C2-L72

This might be one reason why it works well for float seems not be so performant on half.

Jan 18 '24 06:01 lji-ilm

FYI: I just created a PR on the Bazel Central Registry to add c-blosc2 2.12.0 (https://github.com/bazelbuild/bazel-central-registry/pull/1349). Once this is merged Bazel builds can be fixed this way:

Add to MODULE.bazel:

bazel_dep(name = "c-blosc2", version = "2.12.0")

Add to BUILD.bazel under the deps section of cc_library `OpenEXR´:

"@c-blosc2//c-blosc2",

Jan 21 '24 18:01 Vertexwahn

I am working through the build matrix. Please bear with me...

Apr 02 '24 15:04 pleprince

Hi @cary-ilm I am trying to solve this CI issue: https://github.com/AcademySoftwareFoundation/openexr/actions/runs/8558707537/job/23453828685?pr=1604#step:9:35 Is there a real possibility of building on a system without the threading library ? If so, we can't use blosc2 in these builds. Thanks for your time

Apr 04 '24 18:04 pleprince

Not sure, no experience with blosc here, although I do know that OpenVDB uses it. Check out their workflow file here, since it appears you need to install it explicitly.

Apr 04 '24 18:04 cary-ilm

@cary-ilm Blosc2 won't compile without a threading lib, so I guess I am asking if the openexr builds without threading are expected to run on a system without multithreading support ?

Apr 04 '24 18:04 pleprince

Use for the Bazel build (in MODULE.bazel): bazel_dep(name = "c-blosc2", version = "2.12.0.bcr.1") Version 2.12.0 had some Bazel related visibility issues - that the reson why it is not working. 2.12.0.bcr.1 should fix the issue for the Bazel build.

Apr 04 '24 18:04 Vertexwahn

Thanks @Vertexwahn ! :-)

Apr 04 '24 19:04 pleprince

@Vertexwahn, it looks like that wasn't enough to fix the Bazel side. Would you have any suggestion ? Thanks

Apr 04 '24 19:04 pleprince

@pleprince You need to add some source files to the OpenEXR lib in the BUILD.bazel file:

cc_library(
    name = "OpenEXR",
    srcs = [
        ...
        "src/lib/OpenEXR/ImfZstdCompressor.cpp",
        "src/lib/OpenEXRCore/internal_zstd.c",
        "src/lib/OpenEXR/ImfZstdCompressor.h",
   ],
   ...

Please note even the the ZstdCompressor.h file gets added to the srcs and not hdrs section. Adding it to hdrs would make it a public header file that is visible form outside of the library which is not necessary here I guess.

Apr 05 '24 10:04 Vertexwahn

@Vertexwahn, everything build locally/remotely except //src/examples:OpenEXRExamples. I have read the bazel docs and googled around but I can't fix it.

Apr 05 '24 13:04 pleprince

@pleprince It turned out that the c-blosc2 Bazel module still has some issues. I opened a PR on the Bazel Central Registry to fix it. Once https://github.com/bazelbuild/bazel-central-registry/pull/1773 is merged you can bump the version to: bazel_dep(name = "c-blosc2", version = "2.12.0.bcr.2" (last number changed from "1" to "2") and then it should work. Usually merging in the BCR works quite fast (1-2 work days). If you cannot wait for this feel free to set the Bazel GitHub workflow to manual execution only (e.g. ignoring it for PRs) - I will than fix the Bazel build once https://github.com/bazelbuild/bazel-central-registry/pull/1773 is merged.

Apr 05 '24 19:04 Vertexwahn

@pleprince I just added the updated version of c-blosc2 to my personal Bazel registry. You can create a .bazelrc (at the root dir or the OpenEXR repo, next to MODULE.bazel file) and add the following content:

common --registry=https://raw.githubusercontent.com/Vertexwahn/bazel-registry/main/
common --registry=https://bcr.bazel.build

This way also my personal registry will be considered that has already bazel_dep(name = "c-blosc2", version = "2.12.0.bcr.2") available. Did not test it - but might be a working workaround until https://github.com/bazelbuild/bazel-central-registry/pull/1773 is merged.

Apr 05 '24 20:04 Vertexwahn

@pleprince Just tested on Ubunut 22.04:

diff --git a/.bazelversion b/.bazelversion
index a8907c02..21c8c7b4 100644
--- a/.bazelversion
+++ b/.bazelversion
@@ -1 +1 @@
-7.0.2
+7.1.1
diff --git a/MODULE.bazel b/MODULE.bazel
index 63d3f067..ba4e4e2d 100644
--- a/MODULE.bazel
+++ b/MODULE.bazel
@@ -10,4 +10,4 @@ bazel_dep(name = "bazel_skylib", version = "1.5.0")
 bazel_dep(name = "imath", version = "3.1.11")
 bazel_dep(name = "libdeflate", version = "1.19")
 bazel_dep(name = "platforms", version = "0.0.8")
-bazel_dep(name = "c-blosc2", version = "2.12.0.bcr.1")
+bazel_dep(name = "c-blosc2", version = "2.12.0.bcr.2")

bazel build //... && bazel test //... works for me with the above applied patches

Apr 05 '24 20:04 Vertexwahn

Thank you so much @Vertexwahn !

Apr 07 '24 12:04 pleprince

@cary-ilm @peterhillman The fuzz test is failing to build. It is compiling with clang++ but fails to link because ld doesn't seem to support DWARF-5 on that system image. clang 14+ now defaults to DWARF-5 and I wonder what's the best option here:

modify the compile flags to generate DWARF-4
link the fuzz binaries with lld instead of ld (pass -fuse-ld=lld to clang++). Not sure the disk image has lld installed though. Thanks

Apr 08 '24 14:04 pleprince

@pleprince https://github.com/bazelbuild/bazel-central-registry/pull/1773 is merged now. Use bazel_dep(name = "c-blosc2", version = "2.12.0.bcr.2") now! Should fix the Bazel build issues.

Apr 16 '24 10:04 Vertexwahn

@pleprince I'm a little out of my depth here. I would have thought it was OK to change the fuzz test compile flags, since the purpose of fuzz testing is to verify the code correctness, not that it builds on all architectures. Does this issue mean that adding zstd adds an additional constraint on compiling debug builds in general, or is it just a quirk of the way that the oss-fuzz builds are configured?

Apr 19 '24 22:04 peterhillman

@pleprince I'm a little out of my depth here. I would have thought it was OK to change the fuzz test compile flags, since the purpose of fuzz testing is to verify the code correctness, not that it builds on all architectures. Does this issue mean that adding zstd adds an additional constraint on compiling debug builds in general, or is it just a quirk of the way that the oss-fuzz builds are configured?

That error seems to pop up because for some reason blosc is compiled with DWAF5 format symbols, but the linker used for the fuzzer does not understand this ( clang/llvm forums seem to mention this error ). I have not encountered this issue locally, but then again I have a fairly recent toolchain.

What I don't understand is what makes blosc so special: I presume the fuzzer uses the same linker as the regular shared object EXR build chain ?

Apr 19 '24 23:04 clusty

openexr openexr copied to clipboard

Zstd compressor (#3)

openexr
openexr copied to clipboard