tortoise-tts-fast icon indicating copy to clipboard operation
tortoise-tts-fast copied to clipboard

Docoupling voice model generation from text generation.

Open perkel666 opened this issue 2 years ago • 1 comments

The issue.

If I understand it right tortoise does this:

  • takes generic model
  • finetunes it on .wav files
  • generate voice from text based on that finetuned model

Which means each time to produce one sentence it does each time finetuning.



The solution

  • Decouple voice finetuning with .wav files from generation of voice based on text.
  • Make script to finetune model with .wavs and save it for future use without generation part.
  • Provide a console script to generate voice from text based on finetuned model previously without finetuning it again.

perkel666 avatar Feb 05 '23 23:02 perkel666