VAENAR-TTS
VAENAR-TTS copied to clipboard
The config of hifigan used when generate samples
Hi, I want to know what config does you use when you train the hifigan model of DataBaker to get the samples in the webset https://light1726.github.io/vaenar-tts/.
With these parameters clearified, we can better compare the quality of the synthsized wavs with other SOTA acoustic models.
I mean the following three parameter in config file.
"upsample_rates":
"upsample_kernel_sizes":
"upsample_initial_channel":
Thank you!
Thanks, will upload the pre-trained hifi-gan model as well as the configuration file soon.
Thanks, will upload the pre-trained hifi-gan model as well as the configuration file soon.
Could you share link?
Thanks, will upload the pre-trained hifi-gan model as well as the configuration file soon.
Could you share link? Here you go. https://drive.google.com/file/d/1ETxBYV4cMMqYMvXspnDNy7CMmP_UW3rL/view?usp=sharing
Hi, Does vocoder.py script take the mels as input that are generated when using the inference.py script?
Hi, Does vocoder.py script take the mels as input that are generated when using the inference.py script?
yes, please follow the readme.txt in the folder.
Thanks, could you point me to the repo you used to train the hi-fi gan. I would like to train it using my own dataset.
Hi,这个项目和 Hifi-gan是什么关系?我下下来你上传的这个hifi-gan预训练模型应该如何使用呢?
为啥我用你的HIFI-GAN模型和你的预训练的db-2000模型,也得不到你的sample一样的效果
我下载了标贝的数据集,然后预处理,然后直接用你的预训练的模型执行inference.py得到mels,然后再用你上传的Hifi-gan的模型来生成wav,得到的结果听上去没有你https://light1726.github.io/vaenar-tts/里的sample效果好,而且我发现你的是16kHZ的,而我得到的是22.05KHZ的 CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./pretrained/db-test-2000 --data_dir ./biaobei/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./pretrained/db-model_dir/ckpt-2000
python vocoder.py -i ./mels-pre/ -o ./waves-pre/ --checkpoint_file ./model_dir/g_01160000
Thanks, could you point me to the repo you used to train the hi-fi gan. I would like to train it using my own dataset.
Check it out here: https://github.com/thuhcsi/hifi-gan
我下载了标贝的数据集,然后预处理,然后直接用你的预训练的模型执行inference.py得到mels,然后再用你上传的Hifi-gan的模型来生成wav,得到的结果听上去没有你https://light1726.github.io/vaenar-tts/里的sample效果好,而且我发现你的是16kHZ的,而我得到的是22.05KHZ的 CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./pretrained/db-test-2000 --data_dir ./biaobei/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./pretrained/db-model_dir/ckpt-2000
python vocoder.py -i ./mels-pre/ -o ./waves-pre/ --checkpoint_file ./model_dir/g_01160000 Hi!我并没有预训练的Databaker的hifi-gan,这个./model_dir/g_01160000是LJspeech的hifi-gan模型。
我下载了标贝的数据集,然后预处理,然后直接用你的预训练的模型执行inference.py得到mels,然后再用你上传的Hifi-gan的模型来生成wav,得到的结果听上去没有你https://light1726.github.io/vaenar-tts/里的sample效果好,而且我发现你的是16kHZ的,而我得到的是22.05KHZ的 CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./pretrained/db-test-2000 --data_dir ./biaobei/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./pretrained/db-model_dir/ckpt-2000 python vocoder.py -i ./mels-pre/ -o ./waves-pre/ --checkpoint_file ./model_dir/g_01160000 Hi!我并没有预训练的Databaker的hifi-gan,这个./model_dir/g_01160000是LJspeech的hifi-gan模型。
那你在https://light1726.github.io/vaenar-tts/里上传的中文的sample是怎么得到的呢?效果很棒
我下载了标贝的数据集,然后预处理,然后直接用你的预训练的模型执行inference.py得到mels,然后再用你上传的Hifi-gan的模型来生成wav,得到的结果听上去没有你https://light1726.github.io/vaenar-tts/里的sample效果好,而且我发现你的是16kHZ的,而我得到的是22.05KHZ的 CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./pretrained/db-test-2000 --data_dir ./biaobei/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./pretrained/db-model_dir/ckpt-2000 python vocoder.py -i ./mels-pre/ -o ./waves-pre/ --checkpoint_file ./model_dir/g_01160000 Hi!我并没有预训练的Databaker的hifi-gan,这个./model_dir/g_01160000是LJspeech的hifi-gan模型。
那你在https://light1726.github.io/vaenar-tts/里上传的中文的sample是怎么得到的呢?效果很棒
那些是用Griffin-Lim恢复的。
感谢您的回答,但是你的中文sample没有机器声,我用你的预训练的VAENAR模型db-model_dir/ckpt-2000, 执行CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./biaobei/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./pretrained/db-model_dir/ckpt-2000, 得到的和你的不一样,有机器声,下面这些其中的一些,麻烦帮忙分析一下可能是什么原因?