mayfool
mayfool
I tried it for a single speaker dataset, rtf surprises me. Have you ever use basis-melgan for a multi-speaker dataset, or is it suitable for unseen speaker tts synthesis?
Thanks for the implemention of ISTFT. It has better inference speed than hifigan v1.However, I found that there is a single frequency line which would cause little noise.I use 16KHZ...
Thansk for your great job. Seems you use the speaker id not the reference wav to separate the speakers. I wonder will this repo support zero-shot voice-cloning?
when I try to use this repo, i got the error like this "RuntimeError: stft input and window must be on the same device but got self on cuda:0 and...
https://github.com/tuanh123789/AdaSpeech/blob/64f15c4b3fa4590267f12930d7aaf411a1b36d1e/preprocessor/preprocessor.py#L336