segtok issues

Results 4 segtok issues

Sort by recently updated

text to sentences segmentation

I have plain text without any punctuation or sentence stop in German. How can i make the sentence segments with stop.?

Word tokenizer does not split apostrophe and apostrophe s

Is it possible that the word tokenizer does not split off apostrophe and apostrophe s: E.g. **Toyota's** is considered a _single_ token as opposed to being split into **Toyota** and...

pwichmann

Over-splitting on quotes with names.

We are seeing a few issues with segtok being over-eager to split quoted sentences with names directly after the quoted section. Ex. "Good morning," said Harry. "Good morning?" asked Harry....

jakepoz

enhancement

bug: split_contractions fails for certain patterns

For example: ``` split_contractions(word_tokenizer("OʼHaraʼs"))

MattGPT-ai

segtok
segtok copied to clipboard

Metadata

text to sentences segmentation

Word tokenizer does not split apostrophe and apostrophe s

Over-splitting on quotes with names.

bug: split_contractions fails for certain patterns

← Metadata

Owner

Metadata

segtok segtok copied to clipboard

Metadata

text to sentences segmentation

Word tokenizer does not split apostrophe and apostrophe s

Over-splitting on quotes with names.

bug: split_contractions fails for certain patterns

← Metadata

Owner

Metadata

segtok
segtok copied to clipboard