BigVGAN icon indicating copy to clipboard operation
BigVGAN copied to clipboard

Results?

Open PiotrDabkowski opened this issue 2 years ago • 3 comments

First of all, thank you for the project, I think it is really useful, especially that the official NVIDIA implementation is not released yet!

Did you manage to train the model to satisfactory quality, and replicate the results from the paper? The one sample provided seems to be very early in training (30k steps).

Would it be possible to include some samples or pretrained model to see how the implementation works in practice?

If you have not trained, then I can try training it for 3 days on 4x 3090 and see what happens. Thank you!

PiotrDabkowski avatar Jul 17 '22 09:07 PiotrDabkowski

Thank you for your interest.

Actually, due to the computational resource constraints, I stopped training BigVGAN vocoder 😢. (I trained it only for 300k steps)

When I evaluated, BigVGAN has much better performance than HiFi-GAN in Mel loss and PESQ metrics (Both in wide and narrow bands) as described in the paper.

However, in my subjective opinion, the audio quality of BigVGAN is slightly improved than that of HiFi-GAN. But, I just evaluated it with seen speaker data and I think that the performance for out-of-distribution samples is much better...!

Now, I will train this model with an end-to-end TTS system... I think it will take 25 days on 4x A100...

Before that, I would be appreciated if you train it and share the results 👍

sh-lee-prml avatar Jul 17 '22 12:07 sh-lee-prml

@sh-lee-prml hello, could you share the results of your E2E training?

skol101 avatar Nov 20 '22 08:11 skol101

@sh-lee-prml
Thank you for creating this repo. Can you please share the checkpoint for the 300k step vocoder training?

suvaansh avatar Nov 21 '22 23:11 suvaansh