FastSpeech
FastSpeech copied to clipboard
Have anyone tried using LSTM to replace FFT block?
I have trained [37800/192000] steps, and it seems won't converge to a good value, especially the duration loss, it doesn't change much.
Mel Loss: 2.8434, Mel PostNet Loss: 2.5580, Duration Loss: 2.3693;