voice-conversion
voice-conversion copied to clipboard
Hi,i got nan loss the same as you
I implemented the code with some other source codes,Im sure that the parts are right beacase i checkd them Independently,but i got the same nan loss as you,when trained 100 iters about,the loss became bigger and bigger and ended with an err,have you found the reason yourself?
Yes, if you're talking about the vq-vae approach than I also got nan loss after around 20-100 iterations. I am still not sure what the problem is. I tried removing the VQ part of the VAE to make it very similar to the NSynth approach here https://github.com/tensorflow/magenta/tree/master/magenta/models/nsynth but it still gave NaN loss. It must have to do with using voice samples instead of short, simple sound instrument samples. Now that you have reminded me, hopefully I'll have time this week to try again to get it to work. I'm working on another repo on just the VQ-VAE code stuff. https://github.com/ASzot/vq-vae-audio
Let me know if you're able to make any progress.
It seems that when set the vq-vae commitment_loss coefficient bigger and the learning rate smaller, it will work better,Now im training it and also have to change the code to get some result.
So you were able to get the model to train without getting NaN for the loss?
It works but it seems that the result is wrong,the result sound like human voice,the quality is somewhat good but not relevant with input,maybe as the paper says: Note that the decoder could completely ignore the de- terministic encoding and degenerate to a standard un- conditioned WaveNet. However, because the encoding is a strong signal for the supervised output, the model learns to utilize it i set beta=5 learning rate=1e-4 batches=2 trainning 30k iters on cmu arctic dataset, but the vq loss is much bigger than that when i tested on mnist dataset,i donot know why, now i change the code to retrain it