fastHan icon indicating copy to clipboard operation
fastHan copied to clipboard

[QUESTION] how to use as segmenter/tokenizer

Open loretoparisi opened this issue 3 years ago • 1 comments

Hello, thanks for this interesting project! Currently in my nlp pipelines I'm using Jieba / Mecab as chinese / segmenter and japanese/tokenizer modules. Is it safe to use FastHan as a replacement of Mecab han tokenizer?

Thank you!

loretoparisi avatar Dec 10 '21 13:12 loretoparisi

I'm not familiar with mecab, but I’m certain that fastHan cannot be used as japanese tokenizer. Because it was only trained on Chinese data samples, it can not even recognize japanese characters

fdugzc avatar Dec 10 '21 13:12 fdugzc