Tacotron-2 the attention alignment do not converge while outputs_per

the attention alignment converge in 35000 steps, while r = outputs_per_step = 2, like this: and converge to a very good rusult after 105000 steps, like this:

but while r = 1, the alignment do not converge after 70000 steps in two separate experiments, like this:

in the paper, r = 1. Dose any one has a good result while r = 1?

by the way, I used keithito's tacotron1 and the alignment do not converge while r = 2, only converge while r = 3, 4, 5

Aug 27 '18 02:08 Jim-Song

Did you shrink the batch size? In my experience, the tacotron batch size might not be less than 32.

Aug 28 '18 08:08 begeekmyfriend

@begeekmyfriend hi, I used the ljspeech corpus to train the tacotron model and the batch_size is 40. I began a new training with the tacotron_initial_learning_rate=1e-4,tacotron_final_learning_rate=1e-6, but there is no sign of convergence. (Totally used the latest commit )

Aug 29 '18 02:08 Jim-Song

@Jim-Song Please wait until 40K steps. Or you might set symmetric_mels = True in hparam.py. I am using r=2 and batch size = 32. step-7000-align 2

Aug 29 '18 02:08 begeekmyfriend

@begeekmyfriend thx, I will try it now

Aug 29 '18 02:08 Jim-Song

@begeekmyfriend the r=1 case alignment still do not converge after 45000 steps.

I noticed that in the tacotron2 paper, the linear loss was not included, but in the latest commit, the loss includes a loss of linear_outputs from mel_outputs by a cbhg. In your experiment ,was the linear loss included?

Aug 29 '18 06:08 Jim-Song

Of course the linear loss included. Then try symmetric_mels = True. In T2 paper linear spectrograms do not needed for wavenet vocoder so if you do not want G&L vocoder, then switch predict_linear to False. By the way, the alignment does nothing with linear outputs.

Aug 29 '18 07:08 begeekmyfriend

@begeekmyfriend after 8 days' training, it finally converge and the generated video samples are also better