vall-e
vall-e copied to clipboard
After 100 epochs training, the model can synthesize natural speech on LibriTTS
I trained vall-e on LibriTTS about 100 epochs (took almost 4 days on 8 A100 GPUs) and I obtained plausible synthesized audio.
Here is a demo. [1] prompt : prompt_link synthesized audio : synt_link
[2] prompt : prompt_link ground truth : gt_link synthesized audio : synt_link
[3] prompt : prompt_link synthesized audio : synt_link
[4] prompt : prompt_link ground truth : gt_link synthesized audio : synt_link
The model I trained has worse quality than original vall-e because of dataset amount. However, It has a promising quality in clean audio. I'm not sure whether I can share my pre-trained LibriTTS model. If I can, I would like to share the pre-trained LibriTTS model.