languagetool
languagetool copied to clipboard
[nl] some false positives in the speller rule
(Collected by @LanguageTool-AS)
- [ ] hyphenated compounds with numbers on the left side are false positives:
2-cd-single
,20-minuten pauze
. Many cases with3D-
(see full results in nightly tests) - [ ] standard/common way of writing decades (years):
30's of goed, zoeken uit,
- [ ] numbers followed by units are expected to have a space in between but remain lower case:
17kg.
,24uur per dag
,253ha meerderjarig zijn verklaard.
,- 8uur, zijn we weer open.
- [ ] Split numbers and words?:
5Waarom moet ik thuis maar een klein beetje eten voordat ik de cursus Koken ga doen?
- [ ] Other FPs:
609bis
... (Sentences from: https://internal1.languagetool.org/regression-tests/via-http/2022-09-01/nl/result_java_MORFOLOGIK_RULE_NL_NL.html)
Comments by Ruud Bars on the forum (https://forum.languagetool.org/t/nl-some-false-positives-in-the-speller-rule-7062/8253):
hyphenated compounds with numbers on the left side are false positives: 2-cd-single , 20-minuten pauze my 2 cents : Have a look at the official speller list: e.g. woordenlijst One should check the complicated rules in ‘technische handleiding spelling’
standard/common way of writing decades (years): 30's of goed, zoeken uit, => Officially it is : de 30’er jaren, de jaren 30.: Jaren '30 / jaren 30 - Taaladvies.net
numbers followed by units are expected to have a space in between but remain lower case: 17kg. , 24uur per dag , 253ha meerderjarig zijn verklaard. , - 8uur, zijn we weer open. => Easily immunized in the disambiguator; add a rule to suggest the space.
Split numbers and words?: 5Waarom moet ik thuis maar een klein beetje eten voordat ik de cursus Koken ga doen? => Good idea, but prepare for many exceptions, e.g. paragraph numbers. Imho, most of these mistakes are artifacts from text extraction.
On Names, many names in Tatoeba are copied from a sentence in a different language, where the name is very uncommon in Dutch. Maybe it is better to use only untranslated Tatoeba sentences. And even those are sometimes of low quality.