Pyphen icon indicating copy to clipboard operation
Pyphen copied to clipboard

Hyphenation error on german word "einen"

Open mikebarkmin opened this issue 6 years ago • 4 comments

"einen" should return "ei-nen". I have also tried to manually add it to the dictionary by inserting a line "ei1nen" after the pattern for "ei1ne" Line 11739 But the pattern is not used.

Here is the test code:

import pyphen
dic = pyphen(filename="my.dic")
dic.inserted("einen")

P.S.: "eine" is correctly inserted and results in "ei-ne".

mikebarkmin avatar Nov 28 '19 11:11 mikebarkmin

The blocking rule from hyph_de_DE.dic:

.ei8nen


might need to recreate hyph_de_DE.dic, because even LibreOffice doesn't seem to use it?

here is a maybe newer version? https://ctan.org/tex-archive/language/hyphenation/dehyph?lang=de

Th3R3alDuk3 avatar Mar 23 '25 18:03 Th3R3alDuk3

my workaround ...

from pyphen import Pyphen
from re import compile

text = "Ich habe einen Hund"

pattern = compile(
    # add special german letters
    r"(?P<alphas>[-a-zA-ZäöüÄÖÜß]+)|"
    r"(?P<digits>\d+)|"
    r"(?P<spaces>\s+)|"
    r"(?P<others>.?)"
)

pyphen = Pyphen(lang="de_DE")
pyphen_exceptions = {"einen": "ei-nen"}

for (alphas, digits, spaces, others) in pattern.findall(text):
    if alphas:
        if exception := pyphen_exceptions.get(alphas):
            hyphenation = exception
        else:
            hyphenation = pyphen.inserted(alphas, hyphen="-")
        print(hyphenation)

Th3R3alDuk3 avatar Mar 23 '25 18:03 Th3R3alDuk3

The blocking rule from hyph_de_DE.dic:

.ei8nen

Yes.

might need to recreate hyph_de_DE.dic, because even LibreOffice doesn't seem to use it?

here is a maybe newer version? https://ctan.org/tex-archive/language/hyphenation/dehyph?lang=de

Maybe, you can ask LibreOffice. Pyphen dictionaries only come from https://git.libreoffice.org/dictionaries, we’ll update it as soon as it’s updated on their repository.

liZe avatar Mar 23 '25 18:03 liZe

I think they're using a different format, but I've already asked about it. German is a difficult language ;).

Th3R3alDuk3 avatar Mar 23 '25 18:03 Th3R3alDuk3