weslang
weslang copied to clipboard
Issue detecting languages in non latin languages
Hello, I've compiled the cld2 lib and built the Java project. When I try detecting some texts, it seems to work for latin languages (Dutch, Spanish, French, English) but when I feed it Arabic or Hebrew, the Result always returns "UNKNOWN".
Did you try: with some of the strings present in the test data.
For example: " או לערוך את העדפות ההפצה אנא עקוב אחרי השלבים הבאים כנס לחשבון האישי שלך ב" "احتيالية بيع أي حساب"
That should be detected by both detectors. Try running the different classifiers in isolation with the option --spring.profiles.active=cld2
or --spring.profiles.active=java_only
, to check that that classifier is actually detecting the correct language.
It could be also an encoding problem.