vall-e icon indicating copy to clipboard operation
vall-e copied to clipboard

After 100 epochs training, the model can synthesize natural speech on LibriTTS

Open dohe0342 opened this issue 1 year ago • 68 comments

I trained vall-e on LibriTTS about 100 epochs (took almost 4 days on 8 A100 GPUs) and I obtained plausible synthesized audio.

Here is a demo. [1] prompt : prompt_link synthesized audio : synt_link

[2] prompt : prompt_link ground truth : gt_link synthesized audio : synt_link

[3] prompt : prompt_link synthesized audio : synt_link

[4] prompt : prompt_link ground truth : gt_link synthesized audio : synt_link

The model I trained has worse quality than original vall-e because of dataset amount. However, It has a promising quality in clean audio. I'm not sure whether I can share my pre-trained LibriTTS model. If I can, I would like to share the pre-trained LibriTTS model.

dohe0342 avatar Mar 20 '23 07:03 dohe0342