ProDiff
ProDiff copied to clipboard
Multi-speaker TTS training
Hi, thanks for your great work! I would like to ask if I want to apply the framework to a multi-speaker TTS task such as TTS on LibriTTS dataset, how can I modify the framework? More specifically, what speaker embedding strategy do you employ and how and where is the speaker embedding adding to the model? Your response could help me a lot! Thanks in advance!