jets icon indicating copy to clipboard operation
jets copied to clipboard

Where to add an attention prior (betabinom prior)

Open changjinhan opened this issue 1 year ago • 3 comments

Hello! I have a question about the adding position of an attention prior. You added the attention prior before calculating forwardsum loss like this. https://github.com/imdanboy/jets/blob/44e3dbcb9e7e5368158917748fa2c6b45039b4d0/espnet2/gan_tts/jets/loss.py#L147 It can decrease the forwardsum loss while forcing a monotonic alignment(log_p_attn). But, you didn't add anywhere in the forward process of the Jets generator and got the durations using the pure 'log_p_attn' by the viterbi algorithm. https://github.com/imdanboy/jets/blob/44e3dbcb9e7e5368158917748fa2c6b45039b4d0/espnet2/gan_tts/jets/generator.py#L593 To make the attention prior effective, I think this should be also added before getting the durations.

What do you think of this?

changjinhan avatar May 16 '23 11:05 changjinhan

Sorry for late, I recently recognized the current implementation regarding on an alignment learning is different from official code Nvidia-FastPitch as discussed at https://github.com/espnet/espnet/issues/5179#issuecomment-1565241556 Thanks a lot, I will check it out first whether there are improvement or not.

imdanboy avatar Jun 07 '23 06:06 imdanboy

Oh, I'm happy to know similar discussion with it and thank you for your reply. We look forward to hearing the results of your further experiments!

changjinhan avatar Jun 14 '23 07:06 changjinhan

Hi, I recently check an experiment regarding on an alignment algorithm and find that diagonal alignment plot is more clear from the very early training stage after fix; normalize input for ctc_loss as well as add attention prior before viterbi decoding.

Although I didn't find clear improvement of speech quality on datasets (ljspeech, kss and internal dataset which is quite clean), the fix on alignment algorithm might be helpful on somewhat noisy, multi speaker dataset.

You can check the fix at https://github.com/espnet/espnet/pull/5288 Thanks for report 😄

imdanboy avatar Jul 14 '23 05:07 imdanboy