Tacotron-2
Tacotron-2 copied to clipboard
the attention alignment do not converge while outputs_per_step=1
the attention alignment converge in 35000 steps, while r = outputs_per_step = 2, like this:
and converge to a very good rusult after 105000 steps, like this:
but while r = 1, the alignment do not converge after 70000 steps in two separate experiments, like this:
in the paper, r = 1. Dose any one has a good result while r = 1?
by the way, I used keithito's tacotron1 and the alignment do not converge while r = 2, only converge while r = 3, 4, 5
Did you shrink the batch size? In my experience, the tacotron batch size might not be less than 32.
@begeekmyfriend hi, I used the ljspeech corpus to train the tacotron model and the batch_size is 40. I began a new training with the tacotron_initial_learning_rate=1e-4,tacotron_final_learning_rate=1e-6, but there is no sign of convergence. (Totally used the latest commit )
@Jim-Song Please wait until 40K steps. Or you might set symmetric_mels = True
in hparam.py
. I am using r=2
and batch size = 32
.
@begeekmyfriend thx, I will try it now
@begeekmyfriend the r=1 case alignment still do not converge after 45000 steps.
I noticed that in the tacotron2 paper, the linear loss was not included, but in the latest commit, the loss includes a loss of linear_outputs from mel_outputs by a cbhg. In your experiment ,was the linear loss included?
Of course the linear loss included. Then try symmetric_mels = True
.
In T2 paper linear spectrograms do not needed for wavenet vocoder so if you do not want G&L vocoder, then switch predict_linear
to False
. By the way, the alignment does nothing with linear outputs.
@begeekmyfriend after 8 days' training, it finally converge
and the generated video samples are also better
Which step did it achieve first alignment at?
140000-180000steps
@begeekmyfriend
@begeekmyfriend Why you can be close to ~250 of Y-axis?Mine and others just 100~160.
@wuzhonglijz It depend on the length of the training wav clips.
@Jim-Song I am new to Tacotron2, Please I want to know where I can found r=1 ?
which attention did u use @Jim-Song ? is it Graves attention or Bahadanu attention?