Try using ChunkCodec as an HDF5 filter
This PR aims to explore how the ChunkCodecCore API could be used to implement an HDF5 filter, and to see what changes to the ChunkCodecCore API could make this easier.
Could we implement this along side the existing (yet unreleased extension) rather than in place of it?
If I recall, there is are some version bounds that may restrict usage pre-Julia 1.10 or later.
Yes, I can try that. The next version of the ChunkCodec packages can work with Julia 1.6, so I'm not sure what the version bounds issue would be.
Another issue is that even though decoding with BZ2Codec is compatible with what the HDF5 filter was doing, there are differences in how concatenated compressed data is handled. Like the command line tool bunzip2, BZ2Codec decoding accepts concatenated frames and returns the decompressed data concatenated.
Unlike bunzip2, BZ2Codec decoding will error if the compressed stream has invalid data appended to it.
From what I can tell, the HDF5 filter will only decode the first frame and ignore all data appended afterwards.
I think it makes sense to have multiple implementations with distinct features and that we should be especially broad in terms of compatibility on the decoding side.
We should also raise these issues with HDF Group.
Something that we might need to work out is priority of multiple implementations of the codec.
In general, I think we should have one excellent default filter implementation, and then make it easier for advanced users to use custom implementations with the https://juliaio.github.io/HDF5.jl/stable/interface/dataset/#Chunks API, which is currently a bit difficult to use correctly with filtered data.
This is probably not much better than the current bzip2 filter, so there isn't much point in merging this right now.
Also, I may want to change the return type of some of the low-level chunk codec functions https://github.com/JuliaIO/ChunkCodecs.jl/pull/72