glow-tts icon indicating copy to clipboard operation
glow-tts copied to clipboard

Loss value

Open trfnhle opened this issue 4 years ago • 4 comments

I am wondering how loss value looks like. Could you give some pictures of the loss during training?

trfnhle avatar Jun 01 '20 09:06 trfnhle

Sorry for the dense calculation of the MLE loss...

I'll let you know when I clean up the clutter in the code. Temporarily, I'll explain the loss one by one.

The original line I implemented was:

l_mle = 0.5 * math.log(2 * math.pi) 
    + (torch.sum(y_logs) + 0.5 * torch.sum(torch.exp(-2 * y_logs) * (z - y_m)**2) - torch.sum(logdet)) 
    / (torch.sum(y_lengths // hps.model.n_sqz) * hps.model.n_sqz * hps.data.n_mel_channels)

It can be decomposed as

l_mle_normal = torch.sum(y_logs) + 0.5 * torch.sum(torch.exp(-2 * y_logs) * (z - y_m)**2)
l_mle_jacob = -torch.sum(logdet)
l_mle_sum = l_mle_normal + l_mle_jacob
denom = torch.sum(y_lengths // hps.model.n_sqz) * hps.model.n_sqz * hps.data.n_mel_channels
l_mle = 0.5 * math.log(2 * math.pi)  + l_mle_sum / denom
  1. l_mle_normal is the negative log likelihood of normal distribution N(z| y_m, y_logs) (except the constant term: 0.5*log(2pi)), where y_m and y_logs are the mean and logarithm of standard deviation of the prior distribution. Please see Equation 2 in the paper.
l_mle_normal = torch.sum(y_logs) + 0.5 * torch.sum(torch.exp(-2 * y_logs) * (z - y_m)**2) 
  1. l_mle_jacob denotes the negative log determinant of jacobian of flows. Please see Equation 1 in the paper.
l_mle_jacob = -torch.sum(logdet)
  1. l_mle_sum denotes the total negative log likelihood of the model, and denom is a denominator to average the total negative log likelihood across batch, time steps and mel channels (Our model force mel-spectrogram lengths y_lengths to be a multiple of n_sqz.).
l_mle_sum = l_mle_normal + l_mle_jacob
denom = torch.sum(y_lengths // hps.model.n_sqz) * hps.model.n_sqz * hps.data.n_mel_channels
  1. Add the the constant term, 0.5*log(2pi), excluded in step 1.
l_mle = 0.5 * math.log(2 * math.pi)  + l_mle_sum / denom

jaywalnut310 avatar Jun 02 '20 02:06 jaywalnut310

Thanks for your detailed explanation. I think you could ignore the constant term, it does not contribute to backpropagation. Btw, I found another paper that has the same idea of learning implicitly duration of each character but in a different approach AlignTTS.

trfnhle avatar Jun 02 '20 16:06 trfnhle

Yes the constant term is ignored in backpropgation. I just left it for exact calculation of log likelihood. And I saw AlignTTS, which also proposes an alignment search algorithm similar to Glow-TTS. I think it is clever, thanks for the heads up! Btw, I hope you enjoy the interesting characteristics of our model such as manipulating the latent representation of speech :)

jaywalnut310 avatar Jun 03 '20 11:06 jaywalnut310

Just wanted to say amazing work! Love the controllability of length and expressiveness. I wanted to try a few ideas of my own using your repository as a codebase by I've run into a strange phenomenon. It's related to the loss function so maybe you could help me understand what is the cause. The strange thing is that the value of l_mle(g0) loss depends on the value range of Mel spectrograms.

Orange - LJSpeech wavs transformed into melspecs using default paramters. Melspec values range from 0.5 to -11.5 Pink - My data transformed the same way as LJSpeech Blue - My data transformed to melspecs with different sfft parameters and then scaled to 0.5 to -11.5 range Gray - My data transformed to melspecs with different sfft parameters. Value range from 0. to 0.76 (the same results if multiplied by -1)

Screenshot 2020-07-29 at 12 23 09

From what I was able to check in the case of data in the range of 0 to 0.76 values differ in the following way l_mle_jacob - is bigger for Mel spectrograms with smaller absolute values. I think it makes sense because jacobian is calculated based on weights and they have to be bigger to result in the same values. l_mle_normal - about the same denom - obviously the same l_mle - with different proportion of l_mle_sum and denom l_mle no longer normalizes to 1. I think it's a problem because the balance between g0 and g1 is disturbed and alignment gets worse

Also I find it quite strange that grad norm keeps increasing on both Blue and Gray curves. The only thing that they have in common is different than default melspec sfft parameters Screenshot 2020-07-29 at 12 57 10

RKorzeniowski avatar Jul 29 '20 10:07 RKorzeniowski