Keith Ito
Keith Ito
One of the datasets they used in the Deep Voice 2 paper was VCTK, which can be downloaded [here](http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html). It's distributed under the ODC attribution license.
It's hard to say without more information, but 13k iterations is probably not enough. - What are you using for training data? - What does your loss curve look like?...
@MXGray: would you be willing to share your pre-trained model on the Nancy corpus?
@navidnadery I'm not sure why this would happen. Maybe there's a lack of long sentences in your training data? You can also try with [Location Sensitive Attention](https://github.com/keithito/tacotron/blob/tacotron2-work-in-progress/models/attention.py) (or hybrid attention)...
There's an example in the training code: https://github.com/keithito/tacotron/blob/a4f5ac3dfc596425206235d931e907b639a60ed4/train.py#L113
I think you have a few options: 1. Collect some training data with words spoken at different speeds, annotate the words with the speed, and train a model on that....
@toannhu Yes, that repo looks great! I'm training right now on LJ Speech. There's some more discussion over at https://github.com/keithito/tacotron/issues/90
Can you attach some examples, and the command line you're using to continue training?
@begeekmyfriend Yes, if you send over a PR, I would be happy to review and merge it.