jwordsplitter icon indicating copy to clipboard operation
jwordsplitter copied to clipboard

small Java library for splitting German compound words

Results 4 jwordsplitter issues
Sort by recently updated
recently updated
newest added

There are some rules in Dutch that currently make JWordSplitter less fit for Dutch. Most difficult is filtering the detected compounds. autoonderdeel is not acceptable, even though auto and onderdeel...

`GermanWordSplitter` does not split words such as _FPÖ-Chefverhandler_. I think, that -- at least in German -- any word with mid-word hyphens could be decomposed in parts separated by hyphens

After testing jwordsplitter on a dataset of German technical vocabulary, a number of words have been extracted which so far had been missing in the languagetool_dict.txt and germanPrefixes.txt lists. These...

Some compounds will not be decomposed as the algorithm searches the longest match and the Morphy/LanguageTool-based dictionary contains compounds. Examples: ``` Einkommensempfängerin Wehrmachtsamt Schwingflügel ``` Solution: remove compounds from `test-de-large.txt`,...