Hui Lu comments

Results 13 comments of


                                            Hui Lu

AutoVC on a large scale data?

Thanks for sharing. From my experience, the temporal resolution of the bottleneck feature (related to mel-spectrogram extraction hop-length and the downsampling frequency) seems to be important for the encoder to...

Downsampling process is different from that described in the paper

By the way, do you cut the input sequence into fixed-length which is the multiple of the downsampling frequency during training? if so how long is the fixed-length?

Downsampling process is different from that described in the paper

I see, thanks for the answer.

What 's the inference RTF on CPU?

Just had a quick test run with the batch size of 1, the result RTF is about 0.0333 seconds per second of Mel-spectrogram.

pytorch implementation

> hi! nice work and finally non-autoregressive tts with no explicit durations as labels. Do you have any plans to release pytorch implementation? Thanks. I'm sorry that I don't have...

synthesized wavs of long texts

Hi @Liujingxiu23, thanks for your feedback. Attention errors can happen for vaenar-tts since there's no restriction posed to attention alignment to make it monotonic, most of them are repetitions of...

Paper Link

> Great job! Where can I get this paper？ Thanks! Just uploaded the paper to arxiv. Feel free to check https://arxiv.org/abs/2107.03298.

The config of hifigan used when generate samples

Thanks, will upload the pre-trained hifi-gan model as well as the configuration file soon.

The config of hifigan used when generate samples

> > Thanks, will upload the pre-trained hifi-gan model as well as the configuration file soon. > > Could you share link? Here you go. https://drive.google.com/file/d/1ETxBYV4cMMqYMvXspnDNy7CMmP_UW3rL/view?usp=sharing

The config of hifigan used when generate samples

> > > > Hi, Does vocoder.py script take the mels as input that are generated when using the inference.py script? yes, please follow the readme.txt in the folder.