Aaron (Yinghao) Li

Results 110 comments of Aaron (Yinghao) Li

Could you make a conda environment with Python 3.10 instead? You can run the colab demo and check package versions there and make sure you install these packages instead.

I think it could be due to not such a training sample during training. The model has never seen a single word during training (because we removed speech shorter than...

You can add some filler words before or after the word you want to speak and cut the audio to only get the word you are interested in.

@easyrider I have tried this `You can do that too.` on Colab and was able to synthesize the speech in any voice. https://vocaroo.com/1f8Rpq84L8H4 https://vocaroo.com/110LoHbYIP9Y https://vocaroo.com/155vjtpiSYLO https://vocaroo.com/19lqIdQEM9uJ (LJSpeech)

@AWAS666 It still works even with a single word `wink.`. It did generate noise if there is no punctuation after this. I think this is caused by the training data...

This is a very interesting issue. During training the guidance scale is 1, and for some reason when the input is small it fails to generalize to higher guidance scale....

You may refer to #39 if you just want portability. I'm not familiar with Onnx so it probably needs to be be done by someone more familiar with this.

Is it caused by `batch_size: 40`? How many GPUs are you using?

I'm away for conference now and can't help until Dec 18. I have added a label and see if anyone else could help debug.

Thanks for your contribution. Could you explain a little more what changes you have made in your PR?