Chung-Ming Chien
Chung-Ming Chien
@LEEYOONHYUNG Oh I just forgot to post the audio samples. I'll update the demo page some other day. Honestly speaking the quality of the synthesized LibriTTS samples is not as...
You may need to check your system log.
Thanks for your suggestion. It is supported now and indeed the audio quality is much better!
@loretoparisi In my experience vocoders are generally independent or weakly-dependent to languages. So feel free to try it. @chrr I somehow forgot to upload the ``hifigan/`` directory. It should be...
@zaidalyafeai I believe the universal HiFiGAN yields the best result for unknown speakers. I also think that there may not be a great performance drop of the pretrained vocoders for...
@zaidalyafeai the preprocessing parameters should match that of the pretrained vocoders, or there may be strange results.
@ZDisket I haven't try it yet since I am busy with some other research throughout these days. I am curious about your results on small datasets. How good is the...
@ZDisket I think autoregressive models require more data to learn a good alignment between text sequences and spectrograms. It is an interesting experiment to compare autoregressive and non-autoregressive models under...
@ZDisket maybe it is because that I use phoneme sequences instead of character sequences? I think that most TTS models available online use character sequences as inputs. But it is...
@ZDisket I think the error of int() won't propagate? It will if I use ``` durations.append(int((e-s)*hp.sampling_rate/hp.hop_length)) ``` But because the end frame of the previous phone will exactly be the...