universal in-place decompression interface
by "in-place" I mean the user would pre-allocate a Vector{UInt8} or something as the sink
sometimes decompression is needed for low-level stuff, such as handling "buffers" in a file spec, and multiple decompression together assemble the entire data blob.
it would be nice if there's a in-place interface that support multiple algorithms through this central package.
I've drafted a PR around Buffers to support faster decompression when loading Arrow files.
Would that support your use case as well? Or would you need the sink to be ByteData (instead of buffer)?
Closed by #132 and released as 0.9.12
I'm reopening this issue because I think there are some issues to work on with the current interface.
- The
Buffertype used in #136 is internal. See #202 - For use in Zarr.jl it would be helpful to be able to decompress directly into for example a
Vector{Float64}to avoid an extra copy. - The underlying codec should be informed somehow that it is doing a fully in-place operation, so it can internally avoid extra buffering and copies. Ref: https://github.com/JuliaIO/CodecZstd.jl/pull/52
somehow I miseed @mkitti 's original comment since the "closed by" refer to this very issue, what was the PR that supposedly fixed this?
For use in Zarr.jl it would be helpful to be able to decompress directly into for example a Vector{Float64} to avoid an extra copy.
this can't work directly, the two possibilities are you have a buffer = reinterpret(UInt8, ...) and you give this buffer to TranscodingStreams.
Or, you have data = reinterpret(Float64, buffer) and give the buffer to TranscodingStreams
Yes, I think this would require a new unsafe_transcode! function that works directly with pointers.
Also, a general unsafe_transcode! interface could be useful for other packages that don't support or need a streaming API like Blosc.jl, LibDeflate.jl, JLD2.jl, HDF5.jl, Zarr.jl... so maybe it should go in a separate LosslessChunkCompressors.jl package, and be added as a dependency here.
@Moelf @mkitti I have a draft interface for in-place encoding and decoding defined in https://github.com/nhz2/ChunkCodecs.jl/blob/main/ChunkCodecCore/src/interface.jl
The interface currently doesn't directly use pointers, which is nice for avoiding GC issues, but sometimes things don't work as expected, for example decoding a view of a PyArray: https://github.com/JuliaPy/PythonCall.jl/issues/579
That's interesting. I will try to take a closer look next week.