acetylSv
acetylSv
Hi, I plotted character lengths of each line in transcription into histogram and got this [plot](http://speech.ee.ntu.edu.tw/~acetylsv/BZ_char_len.png). So I decided to discard sentences whose character length > 300.
I used only the segmented part of Blizzard-2013 dataset which contains 9742 files with about 20 hrs. So I'm not sure what will happen if switching to the bigger one....
Maybe the pre-trained model is not converged to a promising point. What kinds of different reference audio clips have you tried?
Did you specify the pre-trained models path and the infer input text file path in hyperparams.py?
I used[ Python-Wrapper-for-World-Vocoder](https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder) to first extract SPs, APs and f0s acoustic features, and then used [pysptk](https://github.com/r9y9/pysptk), to transfer SPs to mceps. Hope this two repos solve your problem.
> Can you kindly share your script that extracts sp, ap, f0 and mceps? Hi, the functions I used to extract those features and synthesize them back are [here](https://github.com/acetylSv/cycle_gan_vc/blob/35c1609bffd298eb8179a00bb3f396b17f964f94/utils.py#L77). Hyperparameters...
Hi, I've uploaded some samples in 'results/' directory. As you can listen, neither could I get the good result as the author's own demo. I'm not sure is this mainly...