bayes icon indicating copy to clipboard operation
bayes copied to clipboard

How well will this handle Chinese?

Open benjiwheeler opened this issue 5 years ago • 1 comments

I know that Chinese does not have the same density of spaces as English and most languages; a Chinese character is more analogous to an English word than an English letter.

Would you expect your classifier to treat Chinese characters as letters, or as words?

benjiwheeler avatar Jun 27 '19 19:06 benjiwheeler

Depends on your tokenizer. By default it will tokenize Chinese characters as letters, but you can easily modify it with the following tokenizer

bayes({
    tokenizer: function (text) { return text.replace(/\s/g, '').split('') }
})

toonimoadi avatar Oct 21 '21 20:10 toonimoadi