tortoise-tts icon indicating copy to clipboard operation
tortoise-tts copied to clipboard

A possible approach to pronunciation customization

Open StoneCypher opened this issue 2 years ago • 33 comments

Hi, I'm going to re-raise the topic in #12, which is currently closed. I apologize, and I appreciate that this is in some sense bad form.

I also would like the ability to, occasionally, fine-control pronunciation, and I am of the belief that fundamentally it's not a machine solvable problem, thanks to the literal nightmare that is last names. I know six people who have the same last name by codepoint, but none of them say it the same way, and there's nothing your software could ever do to cope with that, because it's unavailable contextual knowledge.

The problem is, if you want to do high quality rendering, getting names right is a sign of respect, so this genuinely matters, and I believe needs to be in some way droppable to user control.

And so I was going to go bug the ocotillo author. Hm. Guess that works out nicely.

I don't entirely understand where the English <-> Audio mapping comes from, but on a quick glance, it looks like it might be in jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli.

And so I was wondering.

  1. How hard would it be to have two of these?
  2. If the underlying symbolic language was in some way deterministic with regards to end pronunciation - that is, it's somehow a least worst case - how hard would it be to adapt the jbetker thing to a second syllabetry?

The reason being, y'know, the International Phonetic Alphabet is in Unicode, and does a pretty reasonable job with most real world languages. And that would reduce the job to Googling someone's name once, putting it in a lookup table in IPA, and promptly forgetting about it for eternity.

Which, to me, sounds pretty good.

Or, if you prefer, ask from Siobhan and Pádraig Moloughney from Worchester, Massachusettes ("shavon and petrick molockney from wooster mass".)

Let's talk to [ipa:ʃəˈvɔːn] and [ipa:ˈpˠɑːɾˠɪɟː mʌːlɒkːniː] about it is nicely unambiguous, and fits with the symbology in the other request

StoneCypher avatar May 01 '22 00:05 StoneCypher