VAENAR-TTS The config of hifigan used when generate samples

Hi, I want to know what config does you use when you train the hifigan model of DataBaker to get the samples in the webset https://light1726.github.io/vaenar-tts/.
With these parameters clearified, we can better compare the quality of the synthsized wavs with other SOTA acoustic models.

I mean the following three parameter in config file. "upsample_rates":
"upsample_kernel_sizes":
"upsample_initial_channel":

Thank you!

Jul 13 '21 09:07 Liujingxiu23

Thanks, will upload the pre-trained hifi-gan model as well as the configuration file soon.

Jul 19 '21 10:07 light1726

Thanks, will upload the pre-trained hifi-gan model as well as the configuration file soon.

Could you share link?

Nov 22 '21 13:11 nikich340

Thanks, will upload the pre-trained hifi-gan model as well as the configuration file soon.

Could you share link? Here you go. https://drive.google.com/file/d/1ETxBYV4cMMqYMvXspnDNy7CMmP_UW3rL/view?usp=sharing

Nov 22 '21 15:11 light1726

Hi, Does vocoder.py script take the mels as input that are generated when using the inference.py script?

Feb 01 '22 09:02 AvashnaGovender

Hi, Does vocoder.py script take the mels as input that are generated when using the inference.py script?

yes, please follow the readme.txt in the folder.

Feb 03 '22 07:02 light1726

Thanks, could you point me to the repo you used to train the hi-fi gan. I would like to train it using my own dataset.

Feb 07 '22 09:02 AvashnaGovender

Hi,这个项目和 Hifi-gan是什么关系？我下下来你上传的这个hifi-gan预训练模型应该如何使用呢？

Mar 29 '22 13:03 leslie2046

为啥我用你的HIFI-GAN模型和你的预训练的db-2000模型，也得不到你的sample一样的效果

Mar 30 '22 08:03 leslie2046

我下载了标贝的数据集，然后预处理，然后直接用你的预训练的模型执行inference.py得到mels,然后再用你上传的Hifi-gan的模型来生成wav,得到的结果听上去没有你https://light1726.github.io/vaenar-tts/里的sample效果好，而且我发现你的是16kHZ的，而我得到的是22.05KHZ的 CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./pretrained/db-test-2000 --data_dir ./biaobei/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./pretrained/db-model_dir/ckpt-2000

python vocoder.py -i ./mels-pre/ -o ./waves-pre/ --checkpoint_file ./model_dir/g_01160000

Mar 30 '22 08:03 leslie2046

Thanks, could you point me to the repo you used to train the hi-fi gan. I would like to train it using my own dataset.

Check it out here: https://github.com/thuhcsi/hifi-gan

Mar 31 '22 06:03 light1726

我下载了标贝的数据集，然后预处理，然后直接用你的预训练的模型执行inference.py得到mels,然后再用你上传的Hifi-gan的模型来生成wav,得到的结果听上去没有你https://light1726.github.io/vaenar-tts/里的sample效果好，而且我发现你的是16kHZ的，而我得到的是22.05KHZ的 CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./pretrained/db-test-2000 --data_dir ./biaobei/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./pretrained/db-model_dir/ckpt-2000

python vocoder.py -i ./mels-pre/ -o ./waves-pre/ --checkpoint_file ./model_dir/g_01160000 Hi！我并没有预训练的Databaker的hifi-gan，这个./model_dir/g_01160000是LJspeech的hifi-gan模型。

Mar 31 '22 07:03 light1726

我下载了标贝的数据集，然后预处理，然后直接用你的预训练的模型执行inference.py得到mels,然后再用你上传的Hifi-gan的模型来生成wav,得到的结果听上去没有你https://light1726.github.io/vaenar-tts/里的sample效果好，而且我发现你的是16kHZ的，而我得到的是22.05KHZ的 CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./pretrained/db-test-2000 --data_dir ./biaobei/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./pretrained/db-model_dir/ckpt-2000 python vocoder.py -i ./mels-pre/ -o ./waves-pre/ --checkpoint_file ./model_dir/g_01160000 Hi！我并没有预训练的Databaker的hifi-gan，这个./model_dir/g_01160000是LJspeech的hifi-gan模型。

那你在https://light1726.github.io/vaenar-tts/里上传的中文的sample是怎么得到的呢？效果很棒

Mar 31 '22 07:03 leslie2046

我下载了标贝的数据集，然后预处理，然后直接用你的预训练的模型执行inference.py得到mels,然后再用你上传的Hifi-gan的模型来生成wav,得到的结果听上去没有你https://light1726.github.io/vaenar-tts/里的sample效果好，而且我发现你的是16kHZ的，而我得到的是22.05KHZ的 CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./pretrained/db-test-2000 --data_dir ./biaobei/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./pretrained/db-model_dir/ckpt-2000 python vocoder.py -i ./mels-pre/ -o ./waves-pre/ --checkpoint_file ./model_dir/g_01160000 Hi！我并没有预训练的Databaker的hifi-gan，这个./model_dir/g_01160000是LJspeech的hifi-gan模型。

那你在https://light1726.github.io/vaenar-tts/里上传的中文的sample是怎么得到的呢？效果很棒

那些是用Griffin-Lim恢复的。

Mar 31 '22 07:03 light1726

感谢您的回答，但是你的中文sample没有机器声,我用你的预训练的VAENAR模型db-model_dir/ckpt-2000, 执行CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./biaobei/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./pretrained/db-model_dir/ckpt-2000, 得到的和你的不一样，有机器声，下面这些其中的一些，麻烦帮忙分析一下可能是什么原因？

Mar 31 '22 07:03 leslie2046

VAENAR-TTS VAENAR-TTS copied to clipboard

The config of hifigan used when generate samples

VAENAR-TTS
VAENAR-TTS copied to clipboard