langdetect
langdetect copied to clipboard
got LangDetectException for Arabic Presentation Forms
for both Arabic_Presentation_Forms-A and Arabic_Presentation_Forms-B characters, detect function throws exception:
>> detect('ﺽ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/.local/lib/python3.5/site-packages/langdetect/detector_factory.py", line 130, in detect
return detector.detect()
File "/home/user/.local/lib/python3.5/site-packages/langdetect/detector.py", line 135, in detect
probabilities = self.get_probabilities()
File "/home/user/.local/lib/python3.5/site-packages/langdetect/detector.py", line 142, in get_probabilities
self._detect_block()
File "/home/user/.local/lib/python3.5/site-packages/langdetect/detector.py", line 149, in _detect_block
raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')
langdetect.lang_detect_exception.LangDetectException: No features in text.
I have the same issue !
You have to add u before the string detect(u'ض') but what if the string is given by the user ?! how to deal with such use case ?
After few searches I've found this Stackoverflow post, you have to add these 3 lines of codes in your script
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
and convert the text to unicode
inputText = unicode(inputText)