tortoise-tts-fast
tortoise-tts-fast copied to clipboard
Docoupling voice model generation from text generation.
The issue.
If I understand it right tortoise does this:
- takes generic model
- finetunes it on .wav files
- generate voice from text based on that finetuned model
Which means each time to produce one sentence it does each time finetuning.
The solution
- Decouple voice finetuning with .wav files from generation of voice based on text.
- Make script to finetune model with .wavs and save it for future use without generation part.
- Provide a console script to generate voice from text based on finetuned model previously without finetuning it again.