scylla icon indicating copy to clipboard operation
scylla copied to clipboard

language detection issues

Open dom1nga opened this issue 7 years ago • 2 comments

"i hate you".language # => "norwegian"
"i hate you so much".language # => "english"
"i love you".language # => "czech"
"kiss me".language # => "finnish"
"talk to me".language # => "italian"

dom1nga avatar Sep 11 '17 22:09 dom1nga

@hashwin How would you suggest to address these issues please?

Laykou avatar Jul 06 '20 21:07 Laykou

@Laykou @dom1nga this library is based on textcat which uses n-grams to detect a language, not any particular language's dictionary. It can get confused when the input is very short and is as such unreliable in those cases.

My suggestion would be to only trust the result if the input text is at least 5 words long, 10 to be on the safe side.

hashwin avatar Jul 07 '20 00:07 hashwin