Hui Lu
Hui Lu
Thanks for sharing. From my experience, the temporal resolution of the bottleneck feature (related to mel-spectrogram extraction hop-length and the downsampling frequency) seems to be important for the encoder to...
By the way, do you cut the input sequence into fixed-length which is the multiple of the downsampling frequency during training? if so how long is the fixed-length?
I see, thanks for the answer.
Just had a quick test run with the batch size of 1, the result RTF is about 0.0333 seconds per second of Mel-spectrogram.
> hi! nice work and finally non-autoregressive tts with no explicit durations as labels. Do you have any plans to release pytorch implementation? Thanks. I'm sorry that I don't have...
Hi @Liujingxiu23, thanks for your feedback. Attention errors can happen for vaenar-tts since there's no restriction posed to attention alignment to make it monotonic, most of them are repetitions of...
> Great job! Where can I get this paper? Thanks! Just uploaded the paper to arxiv. Feel free to check https://arxiv.org/abs/2107.03298.
Thanks, will upload the pre-trained hifi-gan model as well as the configuration file soon.
> > Thanks, will upload the pre-trained hifi-gan model as well as the configuration file soon. > > Could you share link? Here you go. https://drive.google.com/file/d/1ETxBYV4cMMqYMvXspnDNy7CMmP_UW3rL/view?usp=sharing
> > > > Hi, Does vocoder.py script take the mels as input that are generated when using the inference.py script? yes, please follow the readme.txt in the folder.