FastSpeech2
FastSpeech2 copied to clipboard
The audio time problem in the synthesis
The audio in tensorboad, I saw that if the audio length is greater than 11 seconds, it will be reduced to 11 seconds during synthesis.
I want to know which program happens this under, and is there any way to solve it?
Maybe it is constrained by the max_seq_len: 1000
parameter.
https://github.com/ming024/FastSpeech2/blob/d4e79eb52e8b01d24703b2dfc0385544092958f3/config/LJSpeech/model.yaml?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L33
Can you try to increase it ?
Yes, I saw the max_seq_len = 1000 will set the input of encoder in the length. I will try to increase th number, thanks for your reply.
mark