tl icon indicating copy to clipboard operation
tl copied to clipboard

Make parser generic over sink

Open y21 opened this issue 3 years ago • 1 comments

It would be nice if the parser was generic over a "sink" that gives users the ability to have a function called when a tag is visited (streaming parser). The sink could then decide what to do with the received tag. Sometimes, one might not need to parse an entire HTML document, or other things that tl does by default. We could provide default implementations, for example a sink that keeps track of ids and classes, and remembers them (in a map) so that ID lookups run in constant time (this already exists and can be enabled through ParserOptions::track_ids(), but a sink could be nicer). AFAICT parsers like html5ever seem to do this.

y21 avatar Feb 01 '22 09:02 y21

What do you think of adding a ParserOptions::skip_whitespace()? I noticed when parsing there were quite a few Raw(Bytes("\n\n")) that I'd like to ignore.

Or should I rather create a sink in that case?

dist1ll avatar Feb 04 '22 02:02 dist1ll