fastHan [QUESTION] how to use as segmenter/tokenizer

[QUESTION] how to use as segmenter/tokenizer

Open loretoparisi opened this issue 3 years ago • 1 comments

Hello, thanks for this interesting project! Currently in my nlp pipelines I'm using Jieba / Mecab as chinese / segmenter and japanese/tokenizer modules. Is it safe to use FastHan as a replacement of Mecab han tokenizer?

Thank you!

Dec 10 '21 13:12 loretoparisi

I'm not familiar with mecab, but I’m certain that fastHan cannot be used as japanese tokenizer. Because it was only trained on Chinese data samples, it can not even recognize japanese characters

Dec 10 '21 13:12 fdugzc

fastHan fastHan copied to clipboard

[QUESTION] how to use as segmenter/tokenizer

fastHan
fastHan copied to clipboard