zstd-rs
zstd-rs copied to clipboard
Add support for skippable frames
Hi. This add supports for writing and reading skippable frames as well as skipping frames.
There is no support for doing that in a streaming way in the ZSTD C library, so that's why so part of this PR parses some stuff manually.
There are a few TODOs in the code that I'd like you to tell what you think we should do with that.
Thanks for the review.
On the higher-level zstd-rs
side, it seems that there's 2 main things:
- Some wrappers for the zstd-safe functions.
- Some re-implementation of read/write skippable frames in a slightly more "streamable" way - no need to buffer the entire frame twice.
The second point in particular is where most changes are required to re-implement quite a bit of logic from the zstd library. I feel like there's quite a lot of added code and complexity, maybe without being worth the added convenience.
In particular, regarding the re-implementation of read_skippable_frame
:
- Seeking back when an error occurs is a big source of copied code, either from the standard library or from the C zstd library.
- The implementation can stream the
Read
source, but still buffers the entire result.
I would be more comfortable with a solution that:
- Does not seek back on error. If an error happens, the stream is left in an unspecified state. Should simplify the implementation.
- Does not buffer the output, but either provide a
Read
that will only read the skippable content (using.take(content_size)
on the underlying reader) or stream the frame content to aWrite
given as parameter. This way, users could buffer it if needed.
- Does not seek back on error. If an error happens, the stream is left in an unspecified state. Should simplify the implementation.
If you do not seek back, how can you detect that there is a skippable frame?
is_skippable_frame()
requires a &[u8]
, so it's not streaming.
And an attempt to call read_skippable_frame()
and checking for an error could leave the stream at a wrong position without being able to continue reading normal frames.
is_skippable_frame
only needs the &[u8]
to cover the header (in practice the first 4 bytes), so it's still mostly streaming.
And if something happens in read_skippable_frame
, then something is wrong, and there's no guarantee the stream is valid anyway.
is_skippable_frame
only needs the&[u8]
to cover the header (in practice the first 8 bytes), so it's still mostly streaming.
But how would you combine this function with reading frames in a streaming fashion without seeking back?
If you read the first 8 bytes (which could be too much bytes IIRC, e.g. I think you can have a frame of 5 bytes) to check if it's a skippable frame, how would you then read the frame (whether it is skippable or not)?
We could rely on the BufRead
own buffer to "peek" at the first 4 bytes. Though it's true there's no guarantee the BufRead
will actually fill more than 1 byte. If we do want to support that, one alternative would be to store up to 3 bytes of buffered data in a new zstd::stream::zio::reader::State::Peeked
, which would be read from before moving back to State::Reading
- I suspect it might be simpler than the current situation.
Maybe we could go a different way and register a skippable_frame_handler: Option<Box<dyn FnMut(u32, &mut dyn Reader) -> io::Result<()>>>
callback, which will be called in the regular operations if a skippable frame is found.
Ok, so I just pushed with all the changes you asked.
Closes #13 if merged
Ping.
Any update on this?
Hi - sorry, have had very little time to look more into this.
Hi. Any update?
I'm also interested in this feature. I wonder if seekable compression could be implemented in this crate, so I don't have to use https://crates.io/crates/zstd-seekable