rust-punkt icon indicating copy to clipboard operation
rust-punkt copied to clipboard

Implementation of the Punkt sentence tokenizing algorithm in Rust.

Results 6 rust-punkt issues
Sort by recently updated
recently updated
newest added

Hello! I would be happy to take over this crate, certainly long enough to see it into a better place. Mostly, I would like to clean up its dependency tree...

Rust-punkt and NLTK Punkt (with aligning off) produce different results when using exactly the same model. NLTK Punkt correctly identifies abbreviations and doesn't split on them, while rust-punkt, with the...

It seems, like somewhere length of string defines not correctly. Code to reproduce: ```rust use punkt::*; use punkt::params::*; fn main() { let content = "Функция. Речи."; let trainer: Trainer =...

How should one go about adding training data for other languages? What is the JSON data structure?

Right now, the training data all comes as a single package. It might be better to include it as compiled code that is generated from a JSON document.

NLTK has a way to realign sentences ending with characters such as ), }, ], ", etc...

enhancement