jwordsplitter
jwordsplitter copied to clipboard
missing decompositions
Some compounds will not be decomposed as the algorithm searches the longest match and the Morphy/LanguageTool-based dictionary contains compounds. Examples:
Einkommensempfängerin
Wehrmachtsamt
Schwingflügel
Solution: remove compounds from test-de-large.txt
, or don't export them in org.languagetool.dev.ExportGermanNouns
.
Actually, for the LanguageTool use case this isn't a problem, as we're only interested in grammar information and as long as that's available (compound or not) we're fine.
Actually, for the LanguageTool use case this isn't a problem, as we're only interested in grammar information and as long as that's available (compound or not) we're fine.
Not exactly! GERMAN_SPELLER_RULE in LanguageTool marks some words in enumerations as wrong as the last word in the enumeration is not recognized as a compound. Example: "Hieb- und Stichwaffe" -> "Hieb-" is marked as incorrect as the dictionary contains "Stichwaffe" in addition to "Waffe"