FastSpeech2
FastSpeech2 copied to clipboard
MFA version
Hey @ming024, Could you specify the MFA version you used to generate the textgrids you have provided in your repo? Also, did you generate those textgrids by just aligning using a pre-existing acoustic model or by using the train-and-align step on the dataset itself?
Asking because I've been using the latest MFA version (=3.0.0) and textgrid outputs I'm getting have alignment errors compared to the textgrids you have provided. This is also leading to issues in training since the model I trained using your provided textgrids works fine, but the model I trained using my own generated textgrids has issues - the quality of the synthesized audio degrades very fast with time. The audio is fine for the first 2-3 seconds, but then degrades very quickly after that.
Thanks.