UltraSinger
UltraSinger copied to clipboard
Questionable results when hyphenating
Sometimes the output of the automatic hyphenation leaves a bit to be desired.
Examples:
- "liv-ing" or "reas-ons" instead of "li-ving" or "rea-sons"
- Sometimes no syllabification at all, e. g. "everything", "around", "saying", "secret", "little", "better" (should be something like eve-ry-thing, a-round, say-ing, se-cret, lit-tle, bet-ter)
- Doesn't work well with colloquial words e. g. "wanna", "gonna"; or with gerunds that are shortened with apostrophe, like "goin'", "workin'"
That’s why I switched to dictionary files for UltraStar Creator: https://github.com/UltraStar-Deluxe/UltraStar-Creator/tree/master/syllabification.
As a side note, we’re talking about syllabification (splitting in to singable syllables) rather than hyphenation (splitting of written words).
Ok something is broken.. Thanks @DoubleDee73 for the exampels.
UltraSinger actually already uses syllables and not simple hyphenation. hyphenator.Syllables(cleaned_string)
The funny thing is that it returns different results depending on the language and yet they are all wrong.
assert hyphenation("differently", Hyphenator("de_AT")) == ["dif", "fer", "ent", "ly"]
Expected :['dif', 'fer', 'ent', 'ly']
Actual :['dif', 'ferent', 'ly']
assert hyphenation("differently", Hyphenator("en_US")) == ["dif", "fer", "ent", "ly"]
Expected :['dif', 'fer', 'ent', 'ly']
Actual :['dif', 'fer', 'ently']
I need to check what the PyHyphen integration is actually doing there. It actually should use the information from LibreOffice..
@bohning thanks for the list. Will try to use it, if i cant fix PyHyphen.
@mindtakerr thanks for the info about the howmanysyllables website. This makes it easy to check and shows how syllabels are actually formed.
PyHyphen uses C in the background to create syllables. It's not really written in a maintenance-friendly way. I think it makes a few mistakes.
In addition, the hyphen pattern data from LibreOffice are converted from TEX data. They also appear to be outdated.