Zarr.jl icon indicating copy to clipboard operation
Zarr.jl copied to clipboard

Add ZstdCompressor

Open nhz2 opened this issue 9 months ago • 10 comments

Alternative to #149

This implementation supports multithreaded compression and decompression, and also supports the checksum option.

ChunkCodecLibZstd is being added as a direct dependency instead of a package extension, because Zarr.jl already depends on zstd through blosc.

One thing to note is that ChunkCodecLibZstd needs Julia ~~1.11~~ 1.10, and the ChunkCodec API is still experimental. Any suggestions for improving the API would be helpful.

nhz2 avatar Mar 07 '25 16:03 nhz2

Pull Request Test Coverage Report for Build 14317383070

Details

  • 5 of 13 (38.46%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.6%) to 85.461%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/Compressors/zstd.jl 5 13 38.46%
<!-- Total: 5 13
Totals Coverage Status
Change from base Build 13680422725: -0.6%
Covered Lines: 917
Relevant Lines: 1073

💛 - Coveralls

coveralls avatar Mar 07 '25 16:03 coveralls

very much looking forward to both Zstd and the multithreading it brings. are there tests we could add to this PR to ensure it is thread safe?

bjarthur avatar Mar 13 '25 19:03 bjarthur

#181 would add some basic round trip tests, which should cover all the code in this PR.

I'm not sure how to test if this is thread-safe, but in ChunkCodecLibZstd, there is no global state being mutated, and the underlying C library is supposed to be safe to use in multiple threads.

nhz2 avatar Mar 16 '25 02:03 nhz2

Since this requires Julia 1.11 anyways, could we make this into a package extension and an optional dependency instead of a hard dependency?

The main advantage for the merge strategy here is that we do not make Zarr.jl require Julia 1.11. I would at most be more comfortable making it require Julia 1.10.

mkitti avatar Mar 31 '25 20:03 mkitti

I'm happy to accept a PR to ChunkCodecLibZstd.jl to support Julia 1.10. Currently, the only 1.11 feature I am using is the public keyword. But is there a need to install the latest version of an in-development package on an old version of Julia?

nhz2 avatar Mar 31 '25 21:03 nhz2

If the in-development package is "Zarr.jl", then yes. Julia 1.10 is the current long-term-support release, and I would expect upcoming releases of Zarr.jl to support Julia 1.10 for some time. Making "ChunkCodecLibZstd.jl" a mandatory dependency of Zarr.jl would prevent that. I am less concerned about support for Julia versions prior to Julia 1.10.

For "ChunkCodecLibZstd.jl", dependence on Julia 1.11 is less of an issue as long as it is only an optional dependency of Zarr.jl.

Compat.jl could be used to address the Julia version dependency. However, I still prefer codecs as optional dependencies when possible. If a convenience package, ZarrUniverse.jl for example, is needed that loads Zarr.jl and all optional dependencies, that would not be hard to accomodate.

I will send a pull request.

mkitti avatar Mar 31 '25 22:03 mkitti

if it's just public then it's as simple as @compat public foo, bar instead of public foo, bar. unnecessarily restricting version compatibility is a p.i.t.a. please make this change!

bjarthur avatar Mar 31 '25 23:03 bjarthur

PR for using Compat.jl for public: https://github.com/nhz2/ChunkCodecs.jl/pull/31 PR for making ChunkCodecLibZstd an optional dependency: https://github.com/JuliaIO/Zarr.jl/pull/183

mkitti avatar Apr 01 '25 10:04 mkitti

I started to test the ZarrUniverse idea here: https://github.com/mkitti/Zarr.jl/tree/mkitti-zarr-universe/lib/ZarrUniverse

using Pkg
Pkg.add(url="https://github.com/mkitti/Zarr.jl", rev="mkitti-zarr-universe", subdir="lib/ZarrUniverse")

or

] add https://github.com/mkitti/Zarr.jl#mkitti-zarr-universe:lib/ZarrUniverse

mkitti avatar Apr 01 '25 10:04 mkitti

I've updated the PR. It should work with Julia 1.10 now. Also, the new decode! function throws a DecodedSizeError if the decoded size is too small or large, which cleans up the error handling.

nhz2 avatar Apr 07 '25 19:04 nhz2

This should be thread safe because it creates a new context for each compression and decompression call.

I understand that @bjarthur has tested these changes under a multithreaded context. It would be great to see a test for this in the test suite.

mkitti avatar May 12 '25 13:05 mkitti