weslang icon indicating copy to clipboard operation
weslang copied to clipboard

Issue detecting languages in non latin languages

Open sidfeiner opened this issue 7 years ago • 1 comments

Hello, I've compiled the cld2 lib and built the Java project. When I try detecting some texts, it seems to work for latin languages (Dutch, Spanish, French, English) but when I feed it Arabic or Hebrew, the Result always returns "UNKNOWN".

sidfeiner avatar Aug 06 '17 15:08 sidfeiner

Did you try: with some of the strings present in the test data.

For example: " או לערוך את העדפות ההפצה אנא עקוב אחרי השלבים הבאים כנס לחשבון האישי שלך ב" "احتيالية بيع أي حساب"

That should be detected by both detectors. Try running the different classifiers in isolation with the option --spring.profiles.active=cld2 or --spring.profiles.active=java_only, to check that that classifier is actually detecting the correct language.

It could be also an encoding problem.

sk- avatar Aug 07 '17 20:08 sk-