johnfarina

Results 2 comments of johnfarina

The same is true for both Chinese and Korean as well. sacremoses splits all characters: Here's some Chinese: ``` >>> mt = MosesTokenizer(lang='zh') >>> mt.tokenize("记者 应谦 美国") ['记', '者', '应',...

Oh wow, comment on a github issue, go to bed, wake up, bug is fixed! Thanks so much @alvations !!