Tacotron icon indicating copy to clipboard operation
Tacotron copied to clipboard

Multivoice?

Open michael-conrad opened this issue 3 years ago • 5 comments

Are there any plans to extend this to a multi-voice setup?

I am trying to create a TTS for a very low resource language: https://github.com/CherokeeLanguage/cherokee-audio-data

michael-conrad avatar Oct 17 '21 13:10 michael-conrad

Hi @michael-conrad,

I don't have any current plans to extent this project. Its mostly meant to be reproduction of the dynamic convolutional attention mechanism that can be copied/extended by others. I'd be happy to offer any guidance or feedback if you want to adapt this code to a different language or multispeaker setup.

bshall avatar Oct 18 '21 12:10 bshall

I was trying to train a network from scratch with Cherokee, as a test, but the CMU DICT has gotten in the way, and I'm clueless as to what to do. My skill set is primarily reformatting data to fit the requirements for input.

On Mon, Oct 18, 2021 at 8:13 AM Benjamin van Niekerk < @.***> wrote:

Hi @michael-conrad https://github.com/michael-conrad,

I don't have any current plans to extent this project. Its mostly meant to be reproduction of the dynamic convolutional attention mechanism that can be copied/extended by others. I'd be happy to offer any guidance or feedback if you want to adapt this code to a different language or multispeaker setup.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bshall/Tacotron/issues/6#issuecomment-945702946, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABH4XNBERLDMGWAUUEDUWMTUHQFNHANCNFSM5GE3HAFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

michael-conrad avatar Oct 18 '21 12:10 michael-conrad

Understood.

On Mon, Oct 18, 2021 at 8:13 AM Benjamin van Niekerk < @.***> wrote:

Hi @michael-conrad https://github.com/michael-conrad,

I don't have any current plans to extent this project. Its mostly meant to be reproduction of the dynamic convolutional attention mechanism that can be copied/extended by others. I'd be happy to offer any guidance or feedback if you want to adapt this code to a different language or multispeaker setup.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bshall/Tacotron/issues/6#issuecomment-945702946, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABH4XNBERLDMGWAUUEDUWMTUHQFNHANCNFSM5GE3HAFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

michael-conrad avatar Oct 18 '21 12:10 michael-conrad

That sounds cool! Do you plan to train from phonemes or directly from graphemes?

bshall avatar Oct 18 '21 13:10 bshall

From graphemes.

I'm training using a special pronunciation orthography. The Cherokee Syllabary unfortunately doesn't reflect vowel length and tones. A description of the orthography is at: https://github.com/CherokeeLanguage/cherokee-audio-data/blob/main/pronunciation-key.md

I'm currently experimenting with a fork of "Tomiinek Multilingual Text to Speech" at repo: https://github.com/CherokeeLanguage/Cherokee-TTS

Because the language is low resource, and composed mostly of very short utterances, my results have been poor. I'm having issues with "early termination of sequence", "extra utterances at end of sequence", and other things. I'm kinda of hoping the different attention mechanism would help, especially with longer sequences.

I'm sure it doesn't help much that part of my training data is originally sourced from tape.

I'm also looking at https://github.com/mutiann/few-shot-transformer-tts to see if it might work better, but the amount of audio data to download that is needed for adaptation is astronomical.

michael-conrad avatar Oct 18 '21 13:10 michael-conrad