node-language-detect icon indicating copy to clipboard operation
node-language-detect copied to clipboard

Add support for Japanese

Open KevinDanikowski opened this issue 4 years ago • 3 comments

There is no support for Japanese, however, it's a popular enough language that I think it should be supported.

Current behavior is to guess the language is English due to Japanese characters not being recognized since it's a unique character set.

Sample: "シャーロック・ホームズ (Sherlock Holmes) は、19世紀後半に活躍したイギリスの小説家・アーサー・コナン・ドイルの創作した[1]、シャーロック・ホームズシリーズの主人公である、架空の探偵"

Result:

[
  [ 'english', 0.030795454545454626 ],
  [ 'somali', 0.026553030303030245 ],
  [ 'estonian', 0.021590909090909105 ],
  [ 'hungarian', 0.021098484848484755 ],
  [ 'danish', 0.019962121212121264 ],
  [ 'albanian', 0.019053030303030183 ],
  [ 'hawaiian', 0.015946969696969737 ],
  [ 'french', 0.015643939393939377 ],
  [ 'latin', 0.015606060606060623 ],
  [ 'german', 0.015454545454545388 ],
  [ 'hausa', 0.01435606060606065 ],
  [ 'swedish', 0.012575757575757462 ],
  [ 'welsh', 0.011325757575757489 ],
  [ 'portuguese', 0.010909090909090868 ],
  [ 'czech', 0.010833333333333361 ],
  [ 'spanish', 0.010492424242424137 ],
  [ 'latvian', 0.01041666666666663 ],
  [ 'swahili', 0.010227272727272751 ],
  [ 'norwegian', 0.009356060606060645 ],
  [ 'pidgin', 0.00920454545454541 ],
  [ 'vietnamese', 0.007348484848484826 ],
  [ 'dutch', 0.006212121212121224 ],
  [ 'icelandic', 0.005113636363636487 ],
  [ 'indonesian', 0.003901515151515156 ],
  [ 'lithuanian', 0.0012499999999999734 ]
]

KevinDanikowski avatar Aug 29 '21 22:08 KevinDanikowski

I will happily accept a PR for this along with some tests :+1:

FGRibreau avatar Sep 23 '21 05:09 FGRibreau

Any plan to support this?

yangsa666 avatar May 31 '23 08:05 yangsa666

Not at the moment, but I accept PRs :)

FGRibreau avatar May 31 '23 08:05 FGRibreau