segtok icon indicating copy to clipboard operation
segtok copied to clipboard

Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features.

Results 4 segtok issues
Sort by recently updated
recently updated
newest added

I have plain text without any punctuation or sentence stop in German. How can i make the sentence segments with stop.?

Is it possible that the word tokenizer does not split off apostrophe and apostrophe s: E.g. **Toyota's** is considered a _single_ token as opposed to being split into **Toyota** and...

We are seeing a few issues with segtok being over-eager to split quoted sentences with names directly after the quoted section. Ex. "Good morning," said Harry. "Good morning?" asked Harry....

enhancement

For example: ``` split_contractions(word_tokenizer("OʼHaraʼs"))