syntok
syntok copied to clipboard
Text tokenization and sentence segmentation (segtok v2)
This is a German text containing ordinal numbers. (The original text passed to syntok does not contain `\n`. Just added for readability here). ``` Ich habe am 3. Juni Geburtstag....
Windows builds now fail: https://github.com/fnl/syntok/actions/runs/1973978485 GitHub changed to a new Windows version: https://github.com/actions/virtual-environments/issues/4856 It is necessary to either remove support for Windows or figure out why the build fails and...
Example: `Jackson Hospital's website at https://www.jackson-hospital.com. Individuals may also write to Jackson Hospital's Privacy Officer at 4250 Hospital Drive, Marianna, Florida 32446.` Tokens around expected splitting: `["website", "at", "https://www.jackson", "-hospital.com",...
For example the following snippet will be extracted as one single sentence (ending at the last full stop), but it should perhaps be split at the colons. ``` Here they...
Hi, thank you for the awesome library! I don't know what you did, but at least for the data I need to process (mostly news articles), it seems that syntok...