zstd-rs icon indicating copy to clipboard operation
zstd-rs copied to clipboard

Add support for skippable frames

Open antoyo opened this issue 2 years ago • 12 comments

Hi. This add supports for writing and reading skippable frames as well as skipping frames.

There is no support for doing that in a streaming way in the ZSTD C library, so that's why so part of this PR parses some stuff manually.

There are a few TODOs in the code that I'd like you to tell what you think we should do with that.

Thanks for the review.

antoyo avatar Jan 19 '23 18:01 antoyo

On the higher-level zstd-rs side, it seems that there's 2 main things:

  • Some wrappers for the zstd-safe functions.
  • Some re-implementation of read/write skippable frames in a slightly more "streamable" way - no need to buffer the entire frame twice.

The second point in particular is where most changes are required to re-implement quite a bit of logic from the zstd library. I feel like there's quite a lot of added code and complexity, maybe without being worth the added convenience.

In particular, regarding the re-implementation of read_skippable_frame:

  • Seeking back when an error occurs is a big source of copied code, either from the standard library or from the C zstd library.
  • The implementation can stream the Read source, but still buffers the entire result.

I would be more comfortable with a solution that:

  • Does not seek back on error. If an error happens, the stream is left in an unspecified state. Should simplify the implementation.
  • Does not buffer the output, but either provide a Read that will only read the skippable content (using .take(content_size) on the underlying reader) or stream the frame content to a Write given as parameter. This way, users could buffer it if needed.

gyscos avatar Jan 30 '23 16:01 gyscos

  • Does not seek back on error. If an error happens, the stream is left in an unspecified state. Should simplify the implementation.

If you do not seek back, how can you detect that there is a skippable frame? is_skippable_frame() requires a &[u8], so it's not streaming.

And an attempt to call read_skippable_frame() and checking for an error could leave the stream at a wrong position without being able to continue reading normal frames.

antoyo avatar Jan 30 '23 16:01 antoyo

is_skippable_frame only needs the &[u8] to cover the header (in practice the first 4 bytes), so it's still mostly streaming.

And if something happens in read_skippable_frame, then something is wrong, and there's no guarantee the stream is valid anyway.

gyscos avatar Jan 30 '23 16:01 gyscos

is_skippable_frame only needs the &[u8] to cover the header (in practice the first 8 bytes), so it's still mostly streaming.

But how would you combine this function with reading frames in a streaming fashion without seeking back?

If you read the first 8 bytes (which could be too much bytes IIRC, e.g. I think you can have a frame of 5 bytes) to check if it's a skippable frame, how would you then read the frame (whether it is skippable or not)?

antoyo avatar Jan 30 '23 16:01 antoyo

We could rely on the BufRead own buffer to "peek" at the first 4 bytes. Though it's true there's no guarantee the BufRead will actually fill more than 1 byte. If we do want to support that, one alternative would be to store up to 3 bytes of buffered data in a new zstd::stream::zio::reader::State::Peeked, which would be read from before moving back to State::Reading - I suspect it might be simpler than the current situation.

Maybe we could go a different way and register a skippable_frame_handler: Option<Box<dyn FnMut(u32, &mut dyn Reader) -> io::Result<()>>> callback, which will be called in the regular operations if a skippable frame is found.

gyscos avatar Jan 30 '23 16:01 gyscos

Ok, so I just pushed with all the changes you asked.

antoyo-light avatar Feb 10 '23 21:02 antoyo-light

Closes #13 if merged

cgbur avatar Mar 17 '23 17:03 cgbur

Ping.

antoyo avatar Apr 04 '23 15:04 antoyo

Any update on this?

antoyo avatar May 30 '23 19:05 antoyo

Hi - sorry, have had very little time to look more into this.

gyscos avatar Jun 02 '23 18:06 gyscos

Hi. Any update?

antoyo avatar Aug 31 '23 11:08 antoyo

I'm also interested in this feature. I wonder if seekable compression could be implemented in this crate, so I don't have to use https://crates.io/crates/zstd-seekable

ariel-miculas avatar Nov 07 '23 20:11 ariel-miculas