FastSpeech2 The audio time problem in the synthesis

The audio time problem in the synthesis

Open yiwei0730 opened this issue 3 years ago • 3 comments

The audio in tensorboad, I saw that if the audio length is greater than 11 seconds, it will be reduced to 11 seconds during synthesis. I want to know which program happens this under, and is there any way to solve it?

Jan 28 '22 07:01 yiwei0730

Maybe it is constrained by the max_seq_len: 1000 parameter.

https://github.com/ming024/FastSpeech2/blob/d4e79eb52e8b01d24703b2dfc0385544092958f3/config/LJSpeech/model.yaml?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L33

Can you try to increase it ?

Feb 08 '22 14:02 chenming6615

Yes, I saw the max_seq_len = 1000 will set the input of encoder in the length. I will try to increase th number, thanks for your reply.

Feb 09 '22 01:02 yiwei0730

mark

Apr 10 '22 14:04 leslie2046

FastSpeech2 FastSpeech2 copied to clipboard

The audio time problem in the synthesis

FastSpeech2
FastSpeech2 copied to clipboard