languagetool icon indicating copy to clipboard operation
languagetool copied to clipboard

[nl] some false positives in the speller rule

Open jaumeortola opened this issue 2 years ago • 1 comments

(Collected by @LanguageTool-AS)

  • [ ] hyphenated compounds with numbers on the left side are false positives: 2-cd-single, 20-minuten pauze. Many cases with 3D- (see full results in nightly tests)
  • [ ] standard/common way of writing decades (years): 30's of goed, zoeken uit,
  • [ ] numbers followed by units are expected to have a space in between but remain lower case: 17kg., 24uur per dag, 253ha meerderjarig zijn verklaard., - 8uur, zijn we weer open.
  • [ ] Split numbers and words?: 5Waarom moet ik thuis maar een klein beetje eten voordat ik de cursus Koken ga doen?
  • [ ] Other FPs: 609bis... (Sentences from: https://internal1.languagetool.org/regression-tests/via-http/2022-09-01/nl/result_java_MORFOLOGIK_RULE_NL_NL.html)

jaumeortola avatar Sep 01 '22 14:09 jaumeortola

Comments by Ruud Bars on the forum (https://forum.languagetool.org/t/nl-some-false-positives-in-the-speller-rule-7062/8253):

hyphenated compounds with numbers on the left side are false positives: 2-cd-single , 20-minuten pauze my 2 cents : Have a look at the official speller list: e.g. woordenlijst One should check the complicated rules in ‘technische handleiding spelling’

standard/common way of writing decades (years): 30's of goed, zoeken uit, => Officially it is : de 30’er jaren, de jaren 30.: Jaren '30 / jaren 30 - Taaladvies.net

numbers followed by units are expected to have a space in between but remain lower case: 17kg. , 24uur per dag , 253ha meerderjarig zijn verklaard. , - 8uur, zijn we weer open. => Easily immunized in the disambiguator; add a rule to suggest the space.

Split numbers and words?: 5Waarom moet ik thuis maar een klein beetje eten voordat ik de cursus Koken ga doen? => Good idea, but prepare for many exceptions, e.g. paragraph numbers. Imho, most of these mistakes are artifacts from text extraction.

On Names, many names in Tatoeba are copied from a sentence in a different language, where the name is very uncommon in Dutch. Maybe it is better to use only untranslated Tatoeba sentences. And even those are sometimes of low quality.

jaumeortola avatar Sep 12 '22 14:09 jaumeortola