Tacotron-2 icon indicating copy to clipboard operation
Tacotron-2 copied to clipboard

the attention alignment do not converge while outputs_per_step=1

Open Jim-Song opened this issue 5 years ago • 13 comments

the attention alignment converge in 35000 steps, while r = outputs_per_step = 2, like this: image and converge to a very good rusult after 105000 steps, like this:

image

but while r = 1, the alignment do not converge after 70000 steps in two separate experiments, like this:

image

image

in the paper, r = 1. Dose any one has a good result while r = 1?

by the way, I used keithito's tacotron1 and the alignment do not converge while r = 2, only converge while r = 3, 4, 5

Jim-Song avatar Aug 27 '18 02:08 Jim-Song

Did you shrink the batch size? In my experience, the tacotron batch size might not be less than 32.

begeekmyfriend avatar Aug 28 '18 08:08 begeekmyfriend

@begeekmyfriend hi, I used the ljspeech corpus to train the tacotron model and the batch_size is 40. I began a new training with the tacotron_initial_learning_rate=1e-4,tacotron_final_learning_rate=1e-6, but there is no sign of convergence. (Totally used the latest commit )

image

Jim-Song avatar Aug 29 '18 02:08 Jim-Song

@Jim-Song Please wait until 40K steps. Or you might set symmetric_mels = True in hparam.py. I am using r=2 and batch size = 32. step-7000-align 2

begeekmyfriend avatar Aug 29 '18 02:08 begeekmyfriend

@begeekmyfriend thx, I will try it now

Jim-Song avatar Aug 29 '18 02:08 Jim-Song

@begeekmyfriend the r=1 case alignment still do not converge after 45000 steps.

I noticed that in the tacotron2 paper, the linear loss was not included, but in the latest commit, the loss includes a loss of linear_outputs from mel_outputs by a cbhg. In your experiment ,was the linear loss included?

Jim-Song avatar Aug 29 '18 06:08 Jim-Song

Of course the linear loss included. Then try symmetric_mels = True. In T2 paper linear spectrograms do not needed for wavenet vocoder so if you do not want G&L vocoder, then switch predict_linear to False. By the way, the alignment does nothing with linear outputs.

begeekmyfriend avatar Aug 29 '18 07:08 begeekmyfriend

@begeekmyfriend after 8 days' training, it finally converge image and the generated video samples are also better

Jim-Song avatar Sep 07 '18 07:09 Jim-Song

Which step did it achieve first alignment at?

begeekmyfriend avatar Sep 07 '18 08:09 begeekmyfriend

image image 140000-180000steps @begeekmyfriend

Jim-Song avatar Sep 10 '18 01:09 Jim-Song

@begeekmyfriend Why you can be close to ~250 of Y-axis?Mine and others just 100~160.

wuzhonglijz avatar Dec 28 '18 08:12 wuzhonglijz

@wuzhonglijz It depend on the length of the training wav clips.

begeekmyfriend avatar Dec 28 '18 10:12 begeekmyfriend

@Jim-Song I am new to Tacotron2, Please I want to know where I can found r=1 ?

Mouhamed-Hamed avatar Apr 09 '22 10:04 Mouhamed-Hamed

which attention did u use @Jim-Song ? is it Graves attention or Bahadanu attention?

debasishaimonk avatar Sep 12 '23 10:09 debasishaimonk