scylla
scylla copied to clipboard
language detection issues
"i hate you".language # => "norwegian"
"i hate you so much".language # => "english"
"i love you".language # => "czech"
"kiss me".language # => "finnish"
"talk to me".language # => "italian"
@hashwin How would you suggest to address these issues please?
@Laykou @dom1nga this library is based on textcat which uses n-grams to detect a language, not any particular language's dictionary. It can get confused when the input is very short and is as such unreliable in those cases.
My suggestion would be to only trust the result if the input text is at least 5 words long, 10 to be on the safe side.