languagetool
languagetool copied to clipboard
Wrong language detection
Issue to collect incorrect language detection even with fasttext. Nothing we can easily fix, but we should at least be aware of the issues:
-
Osteuropa
-> fr -
USA, Osteuropa
-> sv
Doesn't it make sense to require a minimal text length before another language is suggested? I have just entered a name (Stephanie) and LT suggests to switch to English. Especially with respect to proper names there may be multiple languages for which a name might be correct (e.g., Maria is a valid/common name in English, German, Italian and maybe others)
Maybe, but I don't know where to draw the line. A text might even start with several names... when you say "LT suggests", what client are you referring to?
It might be wise to exclude any capitalized word (proper names mostly) as a quick hack.
and perhaps exclude familial prepositions. (EG: Van den Wateren never got the hang of his paternal grand parents' tongue.)
English is detected as French on this LinkedIn post:
Aside: shouldn't the messages after the suggestions be in French?
Could you send the full text as text (not just as a screenshot)?
At Congrès Inforsid 2021 (https://inforsid2021.sciencesconf.org/resource/page/id/14) on June 1, Tuesday 2:00 - 5:30 pm. I will present an online workshop (in English) about ASD-STE100.
Controlled language for text simplification: Concepts and implementation
ABSTRACT. In commerce and industry, many organizations use plain language, for example, ‘plain English’ and ‘lenguaje claro’ [Spanish]. For safety-critical documentation, plain language is not always sufficient, and some organizations use controlled language. ASD-STE100 Simplified Technical English is a specification for a controlled language. In this paper, we present the TechScribe term checker for ASD-STE100, which checks a document for conformity to ASD-STE100. Many of the ASD-STE100 rules are applicable to the simplification of scientific texts. To show that, this paper conforms to ASD-STE100 as much as possible.
Sorry Daniel, I should have thought to send the full text.
If you add the English text directly to a post, there is no problem.
To reproduce the problem, add English text and French text in the same post:
The French text made the post too long. After I deleted the French text, LT continued to give the warnings.
French text: RÉSUMÉ. Dans le commerce et l'industrie, de nombreuses organisations utilisent des langues simplifiées, par exemple "plain English" et "lenguaje claro" [éspagnol]. Pour la documentation technique, critique pour la sécurité, "plain language" n'est pas toujours suffisant et certaines organisations utilisent une langue contrôlée. L'ASD-STE100 Simplified Technical English est la spécification d’une langue contrôlée. Dans cet article, nous présentons TechScribe, un logiciel conçu pour vérifier automatiquement la conformité d'un document aux règles ASD-STE100. De nombreuses règles de l'ASD-STE100 sont applicables à la simplification des textes scientifiques. Pour le démontrer, dans la mesure du possible, cet article est conforme à la norme ASD-STE100.
Thanks, I can reproduce it now. It seems to be add-on-related, I have opened an issue there (https://github.com/languagetooler-gmbh/browser-add-on-rewrite/issues/1263).
- Start with English (American) and 'Automatically detect language' not selected.
- Delete all the text.
- Select 'Automatically detect language'.
- End with Australian English:
And another:
Click 'Automatically detect language' and LT detects Romanian