cspell-dicts
cspell-dicts copied to clipboard
Alternate German Spellings
thank you for all your effort! May I ask you to have a look at the German dictionary, too?
To achieve utf8 compliance it is common in German to substitute umlauts:
- Ä -> Ae
- Ö -> Oe
- Ü -> Ue
- ä -> ae
- ö -> oe
- ü -> ue
Currently my solution is to add all words I encounter to a custom dictionary, but it would be great to have these substitutions build-in.
Cheers,
Arne
Originally posted by @ar-std in https://github.com/streetsidesoftware/vscode-cspell-dict-extensions/issues/12#issuecomment-1153812578
@ar-std,
I'm guess that these are approved alternate spellings of German words. Do you have a reference?
I'm not sure, if there is any 'approved reference'. I think it is just common use (mainly for data processing, as these special characters are not available on most keyboards/encodings). On Wikipedia there is only a link to some Oracle reference that explains how they use it...
I would like to help with this, but I'm not at all familiar with how cspell works. Is it enough to iterate over the *.dic file with some reg-exp magic, duplicate every line with umlauts and replace the umlauts in the duplicate line? Or can that be handled more easily and globally by some settings?
Additionally I forgot one mapping (and some additions):
- Ä -> Ae (AE in all-caps)
- Ö -> Oe (OE in all-caps)
- Ü -> Ue (UE in all-caps)
- ä -> ae
- ö -> oe
- ü -> ue
- ß -> ss
@ar-std,
Thank you for the help.
Since the .dic and .aff files come from an external source, I think it is better to just copy them (src/hunspell/index.(aff/dic)) to a new directory in src, keeping only the impacted words in the .dic file. We can then create a substitution rule that will replace Ä with Ae and the rest. Words with ß will need to be duplicated and replaced with ss.
Is this related to #603?