fastHan
fastHan copied to clipboard
[QUESTION] how to use as segmenter/tokenizer
Hello, thanks for this interesting project! Currently in my nlp pipelines I'm using Jieba / Mecab as chinese / segmenter and japanese/tokenizer modules. Is it safe to use FastHan as a replacement of Mecab han tokenizer?
Thank you!
I'm not familiar with mecab, but I’m certain that fastHan cannot be used as japanese tokenizer. Because it was only trained on Chinese data samples, it can not even recognize japanese characters