Puyuan Peng

Results 97 comments of Puyuan Peng

You can find the .pth files here https://huggingface.co/pyp1/VoiceCraft/tree/main

Please find details in the paper. We use 4 A40 GPUs and the biggest model took a little over 2 weeks to train

HF spaces is up and running. regarding colab, make sure you rerun the first 2 cells after it restarts

That's expected, you should still be able to run (I just tested it) ![image](https://github.com/jasonppy/VoiceCraft/assets/47729801/bea3da60-58f1-4f26-91df-c834a2a32d15)

did you click the load model button?

it seems that the encodec model is not downloaded to `pretrained_models`, could you check if it's true? and if https://github.com/jasonppy/VoiceCraft/blob/master/gradio_app.py#L111 is happening?

I'm working on it, in the mean time, please use Gradio through Google Colab https://colab.research.google.com/drive/1IOjpglQyMTO2C3Y94LD9FY0Ocn-RJRg6?usp=sharing

HF spaces is up and running. I uploaded the colab notebook to reflect longer duration supported by newer TTS enhanced models. I personally found that 3~4s is usually enough. for...

Thanks! 830M TTS enhanced and 330M TTS enhanced (to be uploaded) are trained on gigaspeech + lightlight. I recommend using 830M TTS enhanced to evaluate.

> Hi @jasonppy -- I'm curious, if you can spare the details, how exactly did you train the TTS enhanced model compared to the base model? Is it a separate...