rust-tinysegmenter icon indicating copy to clipboard operation
rust-tinysegmenter copied to clipboard

Compact Japanese tokenizer

Results 4 rust-tinysegmenter issues
Sort by recently updated
recently updated
newest added

Roughly doubles the performance

I am trying to integrate `tinysegmenter` with `tantivy` Would it be possible to expose a more advanced API ? For instance expose the offset of the tokens (expressed in bytes)?

This issue was automatically generated. Feel free to close without ceremony if you do not agree with re-licensing or if it is not possible for other reasons. Respond to @cmr...

Hi, I've started seeing the following `panic` on newer rustc version (I'm on 1.80.1 (3f5fd8dd4 2024-08-06)) while using tinysegmenter: ```txt thread 'tokenizers::japanese::tests::japanese_tokenizer' panicked at library/core/src/panicking.rs:219:5: unsafe precondition(s) violated: invalid value...