preprocess icon indicating copy to clipboard operation
preprocess copied to clipboard

Add -c option to split-sentences.perl

Open jelmervdl opened this issue 4 years ago • 1 comments

Some documents contain extremely long lines of generated text (most often links to search page results) that take forever to parse with the regular expressions in split-sentences.perl. Using the -c option these lines can be completely ignored.

jelmervdl avatar Feb 17 '21 10:02 jelmervdl

Ideally we'd replace buffering then splitting with splitting on the fly. Then if there's something long and no split we throw it out. Here I'm a bit concerned we're throwing out stuff that would correctly split. I understand your immediate need though.

kpu avatar Feb 17 '21 20:02 kpu