expressive_tacotron
expressive_tacotron copied to clipboard
It seems that the prosody not transferd
I have checked your sample result of 420k trainning,and tried align the referr sound and the target sound,it seems that they are much different. So i really confused how the auther of the paper done that.Maybe he used so large data set containing 100+ hours so that it can get good result.
performance of this implementation is no where as close to being as good as google demostrated even after 420k. If you looking for a proper implementation check this https://github.com/syang1993/gst-tacotron has better implementation just after 200k iterations . If someone can continue upto 500k-600k iterations on it , I am sure it will be lot closer to goole's.