fast-langdetect
fast-langdetect copied to clipboard
Text length warning does not consider language, flags valid English sentences
Currently, in infer.py#class LangDetector#def _preprocess_text(), a "text too long" warning is raised whenever the text exceeds 100 characters.
In English or Korean, however, 100 characters often corresponds to only a short or medium-length sentence. This makes the warning misleading, since many valid sentences trigger it.
Suggestion:
Perform language detection first, then apply length thresholds appropriate to each language. This would ensure that the "text too long" warning is triggered only when the text is genuinely too long for the detected language.