Puyuan Peng comments

Results 97 comments of


                                            Puyuan Peng

Where are the 330/830TTSEnhanced .pth models?

You can find the .pth files here https://huggingface.co/pyp1/VoiceCraft/tree/main

How long did the model take to train?

Please find details in the paper. We use 4 A40 GPUs and the biggest model took a little over 2 weeks to train

Error when running Gradio.

HF spaces is up and running. regarding colab, make sure you rerun the first 2 cells after it restarts

Error when running Gradio.

That's expected, you should still be able to run (I just tested it) ![image](https://github.com/jasonppy/VoiceCraft/assets/47729801/bea3da60-58f1-4f26-91df-c834a2a32d15)

Error when running Gradio.

did you click the load model button?

AssertionError: Could not resolve compression model checkpoint path: ./pretrained_models/encodec_4cb2048_giga.th

it seems that the encodec model is not downloaded to `pretrained_models`, could you check if it's true? and if https://github.com/jasonppy/VoiceCraft/blob/master/gradio_app.py#L111 is happening?

HF space build is broken

I'm working on it, in the mean time, please use Gradio through Google Colab https://colab.research.google.com/drive/1IOjpglQyMTO2C3Y94LD9FY0Ocn-RJRg6?usp=sharing

HF space build is broken

HF spaces is up and running. I uploaded the colab notebook to reflect longer duration supported by newer TTS enhanced models. I personally found that 3~4s is usually enough. for...

more training details of the TTS enhanced models

Thanks! 830M TTS enhanced and 330M TTS enhanced (to be uploaded) are trained on gigaspeech + lightlight. I recommend using 830M TTS enhanced to evaluate.

more training details of the TTS enhanced models

> Hi @jasonppy -- I'm curious, if you can spare the details, how exactly did you train the TTS enhanced model compared to the base model? Is it a separate...