Pyphen icon indicating copy to clipboard operation
Pyphen copied to clipboard

(German) hyphenation derailed by punctuation characters

Open allefeld opened this issue 3 years ago • 1 comments

I found this strange behavior:

> dic = pyphen.Pyphen(lang='de')

> dic.inserted('begreifbar')
'be-greif-bar'

> dic.inserted('begreifbar.')
'be-greif-ba-r.'

> dic.inserted('begreifbar«.')
'be-greif-ba-r«.'

The first hyphenation is correct. The second and third have trailing punctuation characters (« is a common closing-quote in German printing), which leads to an additional incorrect hyphenation point being inserted.

I tried to use the local hunspell dictionary instead (/usr/share/hyphen/hyph_de_DE.dic), with the same result.

In this case, I could fix it by removing punctuation characters myself, but I'd still consider it to be a bug, possibly related to #24 and #26.

allefeld avatar Apr 18 '22 20:04 allefeld

Hello!

In this case, I could fix it by removing punctuation characters myself

Yes, that’s a "problem" already answered in this comment. Short answer: as some details are specific to each language (and probably to each application), it’s easier to remove the punctuation in your application.

liZe avatar Apr 19 '22 04:04 liZe

Closing, as we don’t plan to handle punctuation in Pyphen.

liZe avatar Mar 12 '23 11:03 liZe