SudachiPy icon indicating copy to clipboard operation
SudachiPy copied to clipboard

Python version of Sudachi, a Japanese tokenizer.

Results 18 SudachiPy issues
Sort by recently updated
recently updated
newest added

While working on the spaCy Japanese model support and integrating Sudachi, ran into the issue that the one-character ellipsis (`…`) was causing errors. If you tokenize this ellipsis you get...

SudachiPy Command Line Ver on Cygwin Terminal. Type: 貴社の記者が汽車で帰社する [Enter] ``` 貴社の記者が汽車で帰社する 貴社 名詞,普通名詞,一般,*,*,* 貴社 の 助詞,格助詞,*,*,*,* の 記者 名詞,普通名詞,一般,*,*,* 記者 が 助詞,格助詞,*,*,*,* が 汽車 名詞,普通名詞,一般,*,*,* 汽車 で 助詞,格助詞,*,*,*,* で...

related #74 JoinKatakana Plugin is one of the most time consuming but simple (input & output are clear) plugin. OK: cythonize OK: change implementation

enhancement
help wanted

related #74 Join Numeric Plugin is one of the most time consuming but simple (input & output are clear) plugin. OK: cythonize OK: change implementation

enhancement
help wanted

https://github.com/explosion/spaCy/issues/3756#issuecomment-516020381 https://github.com/WorksApplications/Sudachi/issues/110#issue-475708033

sudachi issue

The original Java version has a plugin structure to add/remove extra process. How can we do similar thing with SudachiPy?