jwordsplitter icon indicating copy to clipboard operation
jwordsplitter copied to clipboard

missing decompositions

Open danielnaber opened this issue 9 years ago • 1 comments

Some compounds will not be decomposed as the algorithm searches the longest match and the Morphy/LanguageTool-based dictionary contains compounds. Examples:

Einkommensempfängerin
Wehrmachtsamt
Schwingflügel

Solution: remove compounds from test-de-large.txt, or don't export them in org.languagetool.dev.ExportGermanNouns.

Actually, for the LanguageTool use case this isn't a problem, as we're only interested in grammar information and as long as that's available (compound or not) we're fine.

danielnaber avatar Apr 18 '15 18:04 danielnaber

Actually, for the LanguageTool use case this isn't a problem, as we're only interested in grammar information and as long as that's available (compound or not) we're fine.

Not exactly! GERMAN_SPELLER_RULE in LanguageTool marks some words in enumerations as wrong as the last word in the enumeration is not recognized as a compound. Example: "Hieb- und Stichwaffe" -> "Hieb-" is marked as incorrect as the dictionary contains "Stichwaffe" in addition to "Waffe"

f-knorr avatar Feb 18 '17 16:02 f-knorr