VoiceCraft more training details of the TTS enhanced models

Hi, thank you for open-sourcing your excellent work. ❤️

I would like to compare with VoiceCraft as a baseline for my research. I have observed that you have released three TTS enhanced models. I am curious about the training datasets used for all these models. Can I utilize them to evaluate zero-shot TTS models?

Apr 23 '24 12:04 zjlww

Thanks! 830M TTS enhanced and 330M TTS enhanced (to be uploaded) are trained on gigaspeech + lightlight. I recommend using 830M TTS enhanced to evaluate.

Apr 23 '24 13:04 jasonppy

Hi @jasonppy -- I'm curious, if you can spare the details, how exactly did you train the TTS enhanced model compared to the base model? Is it a separate training script? Separate loss? Or simply separate data?

Thanks a lot.

Apr 26 '24 08:04 rlenain

Hi @jasonppy -- I'm curious, if you can spare the details, how exactly did you train the TTS enhanced model compared to the base model? Is it a separate training script? Separate loss? Or simply separate data?

Thanks a lot.

The TTS enhanced model are trained without the first rearrange step introduced in the paper (i.e. no masking)

Apr 26 '24 15:04 jasonppy

Thanks !

Apr 30 '24 08:04 rlenain

Sorry, actually there is something that I don't understand: is the TTS enhanced model trained from scratch as such, or simply finetuned with that specific objective (i.e. no masking) from the base 830m model? Is there a specific script / recipe that exists in the repo to train/finetune like you trained the TTS enhanced model?

Thanks a lot!

May 01 '24 09:05 rlenain

they are finetuned from the giga830M/giga330M that's trained with causal masking. Right now the scripts are not uploaded to the repo yet.

May 01 '24 15:05 jasonppy

I tested the TTSEnhanced models, including the 330M and 830M. sometimes it repeats too long, or can't pronounce short words. Maybe we can set some rules to decide when to stop predicting, or add ASR post-processing to check if the pronunciation is correct. test_sample.zip

Jun 06 '24 10:06 Approximetal

VoiceCraft VoiceCraft copied to clipboard

more training details of the TTS enhanced models

VoiceCraft
VoiceCraft copied to clipboard