rust-punkt
rust-punkt copied to clipboard
Implementation of the Punkt sentence tokenizing algorithm in Rust.
Hello! I would be happy to take over this crate, certainly long enough to see it into a better place. Mostly, I would like to clean up its dependency tree...
Rust-punkt and NLTK Punkt (with aligning off) produce different results when using exactly the same model. NLTK Punkt correctly identifies abbreviations and doesn't split on them, while rust-punkt, with the...
It seems, like somewhere length of string defines not correctly. Code to reproduce: ```rust use punkt::*; use punkt::params::*; fn main() { let content = "Функция. Речи."; let trainer: Trainer =...
How should one go about adding training data for other languages? What is the JSON data structure?
Right now, the training data all comes as a single package. It might be better to include it as compiled code that is generated from a JSON document.
NLTK has a way to realign sentences ending with characters such as ), }, ], ", etc...