UltraSinger icon indicating copy to clipboard operation
UltraSinger copied to clipboard

Questionable results when hyphenating

Open DoubleDee73 opened this issue 1 year ago • 3 comments

Sometimes the output of the automatic hyphenation leaves a bit to be desired.

Examples:

  • "liv-ing" or "reas-ons" instead of "li-ving" or "rea-sons"
  • Sometimes no syllabification at all, e. g. "everything", "around", "saying", "secret", "little", "better" (should be something like eve-ry-thing, a-round, say-ing, se-cret, lit-tle, bet-ter)
  • Doesn't work well with colloquial words e. g. "wanna", "gonna"; or with gerunds that are shortened with apostrophe, like "goin'", "workin'"

DoubleDee73 avatar Dec 11 '23 19:12 DoubleDee73

That’s why I switched to dictionary files for UltraStar Creator: https://github.com/UltraStar-Deluxe/UltraStar-Creator/tree/master/syllabification.

As a side note, we’re talking about syllabification (splitting in to singable syllables) rather than hyphenation (splitting of written words).

bohning avatar Dec 11 '23 19:12 bohning

Ok something is broken.. Thanks @DoubleDee73 for the exampels.

UltraSinger actually already uses syllables and not simple hyphenation. hyphenator.Syllables(cleaned_string) The funny thing is that it returns different results depending on the language and yet they are all wrong.

assert hyphenation("differently", Hyphenator("de_AT")) == ["dif", "fer", "ent", "ly"]
Expected :['dif', 'fer', 'ent', 'ly']
Actual :['dif', 'ferent', 'ly']
assert hyphenation("differently", Hyphenator("en_US")) == ["dif", "fer", "ent", "ly"]
Expected :['dif', 'fer', 'ent', 'ly']
Actual :['dif', 'fer', 'ently']

I need to check what the PyHyphen integration is actually doing there. It actually should use the information from LibreOffice..

@bohning thanks for the list. Will try to use it, if i cant fix PyHyphen.

@mindtakerr thanks for the info about the howmanysyllables website. This makes it easy to check and shows how syllabels are actually formed.

rakuri255 avatar Jan 03 '24 17:01 rakuri255

PyHyphen uses C in the background to create syllables. It's not really written in a maintenance-friendly way. I think it makes a few mistakes.

In addition, the hyphen pattern data from LibreOffice are converted from TEX data. They also appear to be outdated.

rakuri255 avatar Jan 03 '24 18:01 rakuri255