validator-collection icon indicating copy to clipboard operation
validator-collection copied to clipboard

Support Language Detection

Open insightindustry opened this issue 3 years ago • 1 comments

Given the localization validators outlined in #69 and #71 , it may be helpful to extend the library with language-detection capabilities. Namely to add a validator and checker which can detect the language used in a given string along the lines:

  • validators.in_language(value, ..., standard = None) where:
    • value is the string whose contents should be checked to identify the language
    • standard indicates the standard language codes that are returned in response, though where None returns the Human Readable language (e.g. "American English")
  • checkers.is_in_language(value, languages) which returns True if value is detected to be in one of the languages contained in languages

IMPORTANT: Language detection is non-trivial in its complexity, and there are numerous other third-party libraries out there that try to do this. The key considerations are performance and accuracy, with different libraries getting different marks for value (text content) of varying length or complexity.

insightindustry avatar Jan 09 '21 03:01 insightindustry

There are several important questions that need to be answered for this feature:

  1. Should language detection be built in the Validator Collection, or leverage an outside library?
  2. If leveraging an outside library, should that dependency be coupled with the Validator Collection (present in requirements.txt) or should it be considered a conditional dependency?
  3. Should there be an "import selection tree" which tries to optimize for the language detection library that is best for a given value length AND that is available in the runtime environment?

insightindustry avatar Jan 09 '21 03:01 insightindustry