Rishikesh (ऋषिकेश)
Rishikesh (ऋषिकेश)
@WhiteFu if you are using this code then use large (more than 50 hours) expressive dataset like a blizzard for getting a decent result.
@MisakaMikoto96 aware of `Nan` loss, it means your variational autoencoder (VAE) unable to learn the latent representation. This is the common problem when you dealing with Variational autoencoder but the...
Nope I have started training as per paper, I will do change that in future and will compare the results.
Quality is better than v1 of hifigan with less training
@SolomidHero I will check, but I think audio would be good I have train this model in 4 dataset including LJSpeech and it perform good not as good as mentioned...
We tested it on multiple datasets and it working better than hifigan in speed as well as quality please follow same pre-processing and hyperparameter mentioned in the repo.
@thepowerfuldeez Fre-GAN is better than UnivNet
I tried on my own dataset it takes 150k itr to generate excellent voice whereas HiFi-GAN usually takes 1 M steps for same quality.
It only takes 2 days to reach 150k itr
@alexdemartos, I am also skeptic about time domain loss, but we expect muffled or metallic artifacts before discriminator kicks actually that's why we use discriminator to remove those artifacts.