nkcdy

Results 21 comments of nkcdy

> I train the model on the Chinese corpus SLR38 for 1.2 million steps and the generated result for an unseen speaker (from the same corpus but not in the...

@bshall in fact, I have trained a network with 400 mandarin speakers but failed. the quality is very poor after 350k steps using default hparams. then I retrain the network...

@bshall by the way, what's the final loss of your pretrained model?

@bshall As expected, the loss never go below 2.66 even after 350k steps. the quality of the generated audio is not good enough, either. some audios sound good, but some...

@bshall Yes, it is free. http://www.openslr.org/resources/18/data_thchs30.tgz. There are total speakers including 31 female voices and 9 male voices. I add some initialization on GRU cell and try again...

@bshall why only a small slice frame is picked as the Mel spectrogram condition? what will happen if a piece of silent voice is selected?

@bshall I found that the loss maybe isn't a big deal for this network. i retrained it with ZeroSpeech corpus. the loss descrease quickly to 3.0 and stuck at this...

@bshall Here is my results. [UniversalVocoding.zip](https://github.com/bshall/UniversalVocoding/files/3528074/UniversalVocoding.zip) there are five test wavs picked from the training corpus. The file named "cdy.wav" and "cdy_long.wav" are my orignal voices and corresponding generated wavs...

@bshall i test another out of domain femail voice. it is still not good. [OutOfDomainFemaleVoice.zip](https://github.com/bshall/UniversalVocoding/files/3532593/OutOfDomainFemaleVoice.zip) So, it do overfits the training corpus. i plan to add more speakers from different...

oh, what an awkward.... but never mind, it is always a good start point to draw a block diagram when study new paper or new code.