deepvoice3 icon indicating copy to clipboard operation
deepvoice3 copied to clipboard

Status log

Open Kyubyong opened this issue 7 years ago • 6 comments

22 Nov. 2017. Has completed the first draft. I've tested the current hyperparameters on only Nick dataset which is 8 hours long, but not on LJ which is 24 hours long. The results were not good, not terrible. As I tried with the same hyperparameters as the original paper with no success, I changed some of them. Amongst them are application of dilation and positional embedding instead of positional encoding. I found the attention plot of the last layer looks monotonic somewhat, but not clearly. I think the key signal that the network works is, of course, the attention plots.

Kyubyong avatar Nov 22 '17 03:11 Kyubyong

Thank you for your great work! Can you show me the environment such as python2 or python3, tensorflow version and so on?

lsq357 avatar Nov 28 '17 08:11 lsq357

Sure. Python 2 TF 1.3 linux

Kyubyong avatar Nov 28 '17 08:11 Kyubyong

Thanks! I am using LJSpeech-1.0 dataset to train, Can you show me the alignment curve after convergence and how many steps I need to train only using Nick dataset.

lsq357 avatar Nov 28 '17 09:11 lsq357

Any plan to add `JOINT REPRESENTATION OF CHARACTERS AND PHONEMES' as the (deepvoice3,part 3.2) saying image and the keithito tacotron experiments showed faster convergence.

Also, as my experiments, JOINT REPRESENTATION OF CHARACTERS AND PHONEMES does coverge faster.

lsq357 avatar Dec 07 '17 03:12 lsq357

Do you have the synthesized speech files somewhere?

arijit17 avatar Feb 02 '18 15:02 arijit17

hello, Kyubyong, we have pull your code, we test your code with LJ-speech data. we found the synthesized wav files has nothing to do with the content of the "test_sents.txt". Do you have any guide for us?

chenxf0619 avatar Mar 06 '18 03:03 chenxf0619