fast-langdetect icon indicating copy to clipboard operation
fast-langdetect copied to clipboard

Text length warning does not consider language, flags valid English sentences

Open JackyHe398 opened this issue 2 months ago • 6 comments

Currently, in infer.py#class LangDetector#def _preprocess_text(), a "text too long" warning is raised whenever the text exceeds 100 characters.

In English or Korean, however, 100 characters often corresponds to only a short or medium-length sentence. This makes the warning misleading, since many valid sentences trigger it.

Suggestion:
Perform language detection first, then apply length thresholds appropriate to each language. This would ensure that the "text too long" warning is triggered only when the text is genuinely too long for the detected language.

JackyHe398 avatar Sep 17 '25 09:09 JackyHe398