Chung-Ming Chien comments

Results 27 comments of


                                            Chung-Ming Chien

FastSpeech2 trained using LibriTTS dataset

@LEEYOONHYUNG Oh I just forgot to post the audio samples. I'll update the demo page some other day. Honestly speaking the quality of the synthesized LibriTTS samples is not as...

Support to HiFiGan

Thanks for your suggestion. It is supported now and indeed the audio quality is much better!

@loretoparisi In my experience vocoders are generally independent or weakly-dependent to languages. So feel free to try it. @chrr I somehow forgot to upload the ``hifigan/`` directory. It should be...

Support to HiFiGan

@zaidalyafeai I believe the universal HiFiGAN yields the best result for unknown speakers. I also think that there may not be a great performance drop of the pretrained vocoders for...

Support to HiFiGan

@zaidalyafeai the preprocessing parameters should match that of the pretrained vocoders, or there may be strange results.

Multispeaker

@ZDisket I haven't try it yet since I am busy with some other research throughout these days. I am curious about your results on small datasets. How good is the...

Multispeaker

@ZDisket I think autoregressive models require more data to learn a good alignment between text sequences and spectrograms. It is an interesting experiment to compare autoregressive and non-autoregressive models under...

Multispeaker

@ZDisket maybe it is because that I use phoneme sequences instead of character sequences? I think that most TTS models available online use character sequences as inputs. But it is...

Multispeaker

@ZDisket I think the error of int() won't propagate? It will if I use ``` durations.append(int((e-s)*hp.sampling_rate/hp.hop_length)) ``` But because the end frame of the previous phone will exactly be the...