Puyuan Peng comments

Results 97 comments of


                                            Puyuan Peng

Added quick Python demo for users that may not have Jupyter.

Thank you so much! Will test it soon

Further train base model

check out https://github.com/jasonppy/VoiceCraft?tab=readme-ov-file#finetuning, still need to figure out data preparation in training section to perform finetuning

train other languages

Thanks! will put an update for it this week

Usage instructions.

Thanks! about supporting. I appreciate that you like the demo and want to incorporate voicecraft into your nodejs system. It shouldn't be hard to wrap what's in the jupyter notebook...

Tips to improve the quality of text to speech

How long target transcript? The model is trained on short sentences (evarage length 5 sec, although the longest training data goes to 20sec), so you might want to finetune it...

Tips to improve the quality of text to speech

> Thank you! I was using reference audio up to 12 seconds long + target transcript which is about 4 seconds long. > > I’ll try using a reference which...

Tips to improve the quality of text to speech

Some times the speaker similarity can be a bit off, it's like the model uses a different voice than the prompt. One thing that I found can improve speaker similarity...

Tips to improve the quality of text to speech

The TTS finetuned 330M model is up, should be better than the 830M one

Finetuning

Thanks! I haven't tried finetuning a lot. You could use the 330M model. I tried finetuning on as small as 550h libritts and it seems that it doesn't overfit -...

Finetuning

Looks pretty good! I think it's worth generating a few sample. I don't really know what is a good value for loss and top10, as different data have different levels...