sentencex
sentencex copied to clipboard
A sentence segmentation library with wide language support optimized for speed and utility.
``` 1. The Gujarat Declaration principles and overview of the WHO GTMC’s Evidence Workstream. Geetha Krishnan G Pillai, Evidence Unit Head, WHO Global Traditional Medicine Center-GTMC. ``` from: https://www.who.int/news/item/21-01-2024-integration-of-traditional--complementary--and-integrative-medicine-(tcim)-in-the-institutionalization-of-evidence-informed-decision-making code:...
I asked chatgpt but did not get a good answer. Do you know? It seems spaCy wants a Doc object with sentence boundaries returned.  See https://spacy.io/api/sentencizer
It's understood that the library performs non-destructive splitting by default, but would it be possible to add an option to allow "destructive" splitting? In other words, trimming of whitespace around...
I believe it's not safe to always split sentences on ellipses. For example, the following sentence (initially mentioned at https://github.com/DavidAnson/markdownlint/pull/719#issuecomment-1447501641): > Pausing... for... thought... should not [trigger splitting]. ...currently splits...
As mentioned in issue https://github.com/wikimedia/sentencex/issues/10 sentencex strips whitespace around sentences so it's not really "non-destructive" like the docs state. It can be shown in the demo: if you add multiple...
Hello FYI: I ported the code to **Rust**,. Rust implementation passes all the tests the current implementation passes. https://github.com/mush42/tqsm This unlocks a lot of speedups and use cases: - Moderate...
As you can see, `" Well, maybe cavemen who lived in fear of everything didnt get bored."` is supposed to be a separate sentence
Mac intel. I keep getting this error everytime i try to use the code. I have tried deleting my node modules and re-installing but it didnt fix it. package version:...