whatlang-rs icon indicating copy to clipboard operation
whatlang-rs copied to clipboard

Evaluate other language identification methods.

Open greyblake opened this issue 3 years ago • 0 comments

This is issue is a reminder for myself.

Possible options:

  • Chars frequencies
  • 2-grams?
  • The most frequent words (100 or 1000)?
  • Smart/complex resolve between LangA and LangB by identifying traits that are present in one language and absent in another. - This could help when 2 languages have a very similar statistical characteristics.
  • Řehůřek and Kolkus (2009)

See:

greyblake avatar May 03 '22 16:05 greyblake