langdetect
langdetect copied to clipboard
Inaccurate predictions for basic english words
library is unable to detect language for basic english words and hence generates poor inaccurate results as depicted below.
detect("sunday")
=> 'id' | whereas clearly 'sunday' in indonesian is minggu
detect("monday")
=> 'tr' | whereas 'monday' in turkish is 'pazartesi'
and surprisingly, detect('pazartesi')
=> 'es'
Infact,
langdetect.deteect_langs("sunday")
outputs confidences for 'tr' and 'id', and no mention of english whatsoever.
same goes for months, and other basic english words, eg
detect("good")
=> 'so
"son", "song",...