miniz_oxide icon indicating copy to clipboard operation
miniz_oxide copied to clipboard

Support a foolproof(er) higher-level API for streaming

Open golddranks opened this issue 7 years ago • 9 comments

Hi, I have wrapped the streaming (ring-buffer using) API of miniz_oxide behind a higher level API here: https://github.com/golddranks/stream_zipper/blob/master/src/deflate.rs

Is there any interest in supporting such a higher level API within miniz_oxide itself?

I've experimented with a state machine-like API:

pub enum State<'a> {
    HasMoreOutput(ChunkIter<'a>),
    NeedsMoreInput(InputSink),
    Stop(FinishedStream<'a>),
}

The idea is that every function that operates on the state takes self as value, and as such, parsing the stream can proceed as a state machine alternating between states "HasMoreOutput" and "NeedsMoreInput". The states own the internal state, including the ring buffer, which makes this a higher level API as the one provided by miniz_oxide at the moment.

Here's a roughly how it would be used:

        let mut state = start_deflate_stream();

        for chunk in pure_deflate_stream.chunks(500) {
            if let State::NeedsMoreInput(sink) = state {
                state = sink.feed(chunk)?;
            };
            while let State::HasMoreOutput(out) = state {
                println!("Got {:?}", out.get());
                state = out.next()?;
            }
        }

Another experiment I did was to have just InputSink, and then passing a callback that gets the uncompressed output. This is a kind of an internal iterator. Again, here's an example

        let mut state = start_deflate_stream();

        for chunk in pure_deflate_stream.chunks(50) {
            if let State::NeedsMoreInput(sink) = state {
                state = sink.inner_iter(chunk, |out| {
                    println!("Got {:?}", out);
                })?;
            };
        }

What do you think about introducing such a high level API?

golddranks avatar Jun 12 '18 04:06 golddranks

Btw. I can send a PR and work with the maintainers of this library to get it into a shape they want, I just want to know if you are open to such an idea so I won't do needless work.

golddranks avatar Jun 15 '18 02:06 golddranks

Hi. Sorry for the delayed reply. Your State is cleaner than what we have now, I really like it. I think there is a place for the safe wrapper that does not lose streaming functionality. Tomorrow I will have some time to go over it in detail and say something more concrete.

Frommi avatar Jun 15 '18 07:06 Frommi

Okay, I read the code and I think that you are onto something. There are two things that stand out for me:

  • The need for two functions get and next make the HasMoreOutput case feel a bit clunky. Maybe make something like state = out.next(|out_slice| println!("Got {:?}", out_slice))?
  • I think inner_iter is a misnomer. Iter is something that iterates, not a once-executing function. It looks more like a map. And why return full State if that function executes closure multiple times until it becomes a sink again and then wrap it in a State?

So, a bit more generally:

  • For every input action there is one-or-more output actions.
  • If we want to have a out slice as a return value without copies and closures then there is no escaping two function calls, I think.
  • It's probably better to be a bit more explicit about whether a function call executes closure only once or multiple times.

I think the best base abstraction in this case will be this:

  • Initialization with two closures: one that returns next input slice and one that takes output slice.
  • One function next that executes exactly one of the closures.

Then there are quality-of-life things like execution-till-next-input or iterator instead of input closure, etc.

What do you think? Does this makes sense?

Frommi avatar Jun 17 '18 10:06 Frommi

Sorry for the delayed reply. I'm on a vacation at the moment, I'll come back to this later when I have time!

golddranks avatar Jun 19 '18 14:06 golddranks

Just two quick things: I called it inner_iter because there may be multiple outputs for single input and you have to iterate through them before feeding it more input. (Imagine having chunks of one 100 kilobytes, for example!)

The reason why I'd like to avoid callback-style closure in the "basic" hi-level API is that they obstruct error handling because you can't effectively use the ? operator inside them.

golddranks avatar Jun 19 '18 14:06 golddranks

The point about the ? operator is very good. I now think that your first approach with API is best and I can't see ways to improve it. I thought about maybe somehow statically make sure that get was called before next in HasMoreOutput case, but there is no way to do it not extremely clunky.

Frommi avatar Jun 28 '18 18:06 Frommi

Do you want a PR then? I'm afraid that I'm currently quite busy, but I can try and work with it little by little?

golddranks avatar Jul 01 '18 09:07 golddranks

It's okay, I am mostly free on the weekends. Most of the work is already done anyway.

Frommi avatar Jul 03 '18 14:07 Frommi

Hoping we can look into this again, the current rust API is a bit inconsistent and was quickly cobbled together to provide a basis for a C API mirroring miniz. It's not all that easy to use, and it would be nice to have an alternative to using it through flate2 in areas with no std-support.

The original post have one suggestion which may work for how the streaming part could look, maybe there are other suggestions, or some existing API designes out there that would work as well.

oyvindln avatar Jun 28 '20 16:06 oyvindln