Puyuan Peng

Results 97 comments of Puyuan Peng

> BTW, I had already used the new model this morning. Output is fairly similar. I have not tried with less batches. Any ideas on why sometimes the output is...

> Have make a Colab version of @zuev-stepan 's VoiceCraft fork. I think it should be as well part of the merge? > > https://github.com/Sewlell/VoiceCraft-gradio-colab Thanks! I have tested @zuev-stepan...

Checkout quick start with docker, should works for windows https://github.com/jasonppy/VoiceCraft?tab=readme-ov-file#quickstart

Thanks for your efforts, I'm unable to test issues regarding windows, but the docker solution seems to work for some people. Thanks for the feedback on audiocraft installation, I have...

The model decides the emotion of it's generation based on the emotion of the prompt and the content it will generate. The model currently doesn't support hard coding emotion tag

I'll upload the dataset soon. If you want it earlier than that, send me an email

The meta data including the text are up [ReaEdit.txt](https://github.com/jasonppy/VoiceCraft/blob/master/RealEdit.txt), for audio files, they are under different licenses. for libritts I'll just upload them later, for gigaspeech and spotify I'll need...

Thanks! MFA is not really required and any forced alignment tool will do the job, for example some of the new ones include [NeMo](https://github.com/NVIDIA/NeMo/tree/main/tools/nemo_forced_aligner), [Wav2vec2](https://pytorch.org/audio/stable/tutorials/forced_alignment_tutorial.html).

Thanks! I'm not sure I understand your question. If you meant to ask how to reduce unnatural pauses in the generation, try reducing the stop_repetition param to 1 or 2,...