pest icon indicating copy to clipboard operation
pest copied to clipboard

Support for streaming input

Open flip111 opened this issue 8 years ago • 6 comments

Would be nice to be able to parse streams like stdin and network sockets. Can only find file and string at the moment https://docs.rs/pest/1.0.0-beta.9/pest/inputs/trait.Input.html#implementors

flip111 avatar Sep 08 '17 11:09 flip111

I completely agree. Is there any particular stream structure that would make sense to target? Also, I'd be more than happy to mentor this as a PR if you're willing to go for the implementation.

dragostis avatar Sep 08 '17 17:09 dragostis

Is this issue still relevant with the change to work with &str?

jstnlef avatar Apr 03 '18 03:04 jstnlef

I would say so. Currently, I have no particular design for this feature, but streamed parsing is something desireable for pest. Maybe one solution would be to have this in a separate crate that takes pest as a dependency.

dragostis avatar Apr 03 '18 06:04 dragostis

It would also be fantastic if the stream would not have to be consumed completely, but instead pest would emit tokens along the way (basically acting like a stream transforming function).

That would allow it to parse huge amounts of data, that would not fit in memory otherwise.

dbrgn avatar Nov 17 '18 10:11 dbrgn

@dbrgn of course :) you just wrote down the definition of the stream (parser) more or less :p If you need to load all the data in memory first, the data can come from a stream, but after that it's no longer a stream. Also a stream can be potentially endless/infinitely big (network parser on backbone for example).

flip111 avatar Nov 17 '18 12:11 flip111

I think the problem is closely related to applying regexes on streams, which I understand is quite challenging as it involves rewriting much of the engine at the cost of loosing some optimizations. Hyperscan did the step in C/C++ back in the days, but I am aware of very few other initiatives like this.

In particular, I think it also implies the API to be revisited for the streaming case, as partial matches / alternative non-yet-matching paths / dropped hypotheses may be requested from user on each timestep.

This is a lot of work, but it sure would be awesome :)

iago-lito avatar Oct 31 '19 08:10 iago-lito