rust-tinysegmenter
rust-tinysegmenter copied to clipboard
Compact Japanese tokenizer
Roughly doubles the performance
I am trying to integrate `tinysegmenter` with `tantivy` Would it be possible to expose a more advanced API ? For instance expose the offset of the tokens (expressed in bytes)?
This issue was automatically generated. Feel free to close without ceremony if you do not agree with re-licensing or if it is not possible for other reasons. Respond to @cmr...
Hi, I've started seeing the following `panic` on newer rustc version (I'm on 1.80.1 (3f5fd8dd4 2024-08-06)) while using tinysegmenter: ```txt thread 'tokenizers::japanese::tests::japanese_tokenizer' panicked at library/core/src/panicking.rs:219:5: unsafe precondition(s) violated: invalid value...