tortoise-tts-fast Docoupling voice model generation from text generation.

Docoupling voice model generation from text generation.

Open perkel666 opened this issue 2 years ago • 1 comments

The issue.

If I understand it right tortoise does this:

Which means each time to produce one sentence it does each time finetuning.

The solution

Decouple voice finetuning with .wav files from generation of voice based on text.
Make script to finetune model with .wavs and save it for future use without generation part.
Provide a console script to generate voice from text based on finetuned model previously without finetuning it again.

Feb 05 '23 23:02 perkel666