MeloTTS icon indicating copy to clipboard operation
MeloTTS copied to clipboard

How is pronunciation decided?

Open iv2985 opened this issue 1 year ago • 2 comments

Homographs like "wind" have different meaning and pronunciation depending on the context, but same spelling. For example "wind power" vs "wind a clock". How is this pronunciation decided in such cases?

It is pronouncing the "wind" in "wind power" the wrong way - the way it would be pronounced in "wind a clock". Strangely, it gets it right for the default voices, but wrong when I trained a new English voice.

iv2985 avatar Oct 27 '24 07:10 iv2985

More of a banaid use the EN-BR and it sounds better. The EN-US says wine'd power noticeably. but MeloTTS has g2p-en doing the pronunciation as far as I can tell. for me it is loacted in my conda envirnment.. /home/user/anaconda3/envs/melotts/lib/python3.10/site-packages/g2p_en.

Test the current Pronunciation out got to terminal or your your go code runner, and type python and enter and run each one of these.

from g2p_en import G2p

g2p = G2p()

word = "wind"
phonemes = g2p(word)
print(f"Phonemes for '{word}': {phonemes}")`
Phonemes for 'wind': ['W', 'AY1', 'N', 'D']

after you find what you need edit the homo file

nano homographs.en add something like this to the list

WIND|W IH1 N D|W AY1 N D|N

that should get you started on the first half at least

highfillgoods avatar Dec 18 '24 23:12 highfillgoods

One issue is that the code currently calls g2p() separately for each word. The G2p package can lookup the word in the dictionary or guess a pronunciation if it is not in the dictionary. But this doesn't allow G2p to figure out the part of speech to do disambiguation. The W IH1 N D versus W AY1 N D can be distinguished based on noun/verb, so changing MeloTTS to call g2p() on the full text would fix this specific problem.

There are other cases in English where the part of speech (noun/verb/etc.) is not enough to distinguish pronunciations. A full solution is to train another DL model to go from text to phonemes. A recent model I found is SoundChoice 1 which has Apache-2.0 licensed weights available 2. Swapping in a model like this would be a more full-featured fix. But playing around with SoundChoice shows it isn't perfect, either. It doesn't get, "You wind a bobbin but the wind blows."

nwhitehead avatar Dec 31 '24 05:12 nwhitehead