Tacotron
Tacotron copied to clipboard
Multivoice?
Are there any plans to extend this to a multi-voice setup?
I am trying to create a TTS for a very low resource language: https://github.com/CherokeeLanguage/cherokee-audio-data
Hi @michael-conrad,
I don't have any current plans to extent this project. Its mostly meant to be reproduction of the dynamic convolutional attention mechanism that can be copied/extended by others. I'd be happy to offer any guidance or feedback if you want to adapt this code to a different language or multispeaker setup.
I was trying to train a network from scratch with Cherokee, as a test, but the CMU DICT has gotten in the way, and I'm clueless as to what to do. My skill set is primarily reformatting data to fit the requirements for input.
On Mon, Oct 18, 2021 at 8:13 AM Benjamin van Niekerk < @.***> wrote:
Hi @michael-conrad https://github.com/michael-conrad,
I don't have any current plans to extent this project. Its mostly meant to be reproduction of the dynamic convolutional attention mechanism that can be copied/extended by others. I'd be happy to offer any guidance or feedback if you want to adapt this code to a different language or multispeaker setup.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bshall/Tacotron/issues/6#issuecomment-945702946, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABH4XNBERLDMGWAUUEDUWMTUHQFNHANCNFSM5GE3HAFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Understood.
On Mon, Oct 18, 2021 at 8:13 AM Benjamin van Niekerk < @.***> wrote:
Hi @michael-conrad https://github.com/michael-conrad,
I don't have any current plans to extent this project. Its mostly meant to be reproduction of the dynamic convolutional attention mechanism that can be copied/extended by others. I'd be happy to offer any guidance or feedback if you want to adapt this code to a different language or multispeaker setup.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bshall/Tacotron/issues/6#issuecomment-945702946, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABH4XNBERLDMGWAUUEDUWMTUHQFNHANCNFSM5GE3HAFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
That sounds cool! Do you plan to train from phonemes or directly from graphemes?
From graphemes.
I'm training using a special pronunciation orthography. The Cherokee Syllabary unfortunately doesn't reflect vowel length and tones. A description of the orthography is at: https://github.com/CherokeeLanguage/cherokee-audio-data/blob/main/pronunciation-key.md
I'm currently experimenting with a fork of "Tomiinek Multilingual Text to Speech" at repo: https://github.com/CherokeeLanguage/Cherokee-TTS
Because the language is low resource, and composed mostly of very short utterances, my results have been poor. I'm having issues with "early termination of sequence", "extra utterances at end of sequence", and other things. I'm kinda of hoping the different attention mechanism would help, especially with longer sequences.
I'm sure it doesn't help much that part of my training data is originally sourced from tape.
I'm also looking at https://github.com/mutiann/few-shot-transformer-tts to see if it might work better, but the amount of audio data to download that is needed for adaptation is astronomical.