GlaDOS icon indicating copy to clipboard operation
GlaDOS copied to clipboard

A strange, but clever way to train new voices in the piper (onnx) format?

Open cushycrux opened this issue 1 year ago • 2 comments

  1. Download and unpack https://keithito.com/LJ-Speech-Dataset/ (a huge voice dataset incl. the script).
  2. Install RVC WebUI (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI).
  3. Download and install voicemodels in .pth format. (https://voice-models.com/) or create one with your own voice by reading the LJ script.
  4. Optionaly "Convert" the WAV Files from LJ-Speech to another voice with RVC or use your voice's WAV Files.
  5. Download and install piper and train your new voice in onnx format. (https://github.com/rhasspy/piper)
  6. Profit?

ps. piper training is Linux only but works in Windows 10/11 WSL (https://learn.microsoft.com/en-us/windows/wsl/install) Poweshell: wsl --install

Thoughts?

cushycrux avatar Jul 04 '24 08:07 cushycrux

Could be interesting. I've tried several fast voice cloning models (a few minutes audio), and none were very good.

Also, Piper is just a wrapper on VITS, and I'm not sure I like that level of abstraction. I was thinking more about a more minimal wrapper on VITS, as I have around whisper and llama.

dnhkng avatar Jul 04 '24 10:07 dnhkng

@cushycrux have you tried that approach?

MithrilMan avatar Jul 29 '24 15:07 MithrilMan

Closing this, as I will move to a news TTS system.

dnhkng avatar Dec 03 '24 21:12 dnhkng

@dnhkng which one?

MithrilMan avatar Dec 03 '24 21:12 MithrilMan

Leaning towards MeloTTS:

  • It uses a better TTS (much higher rated on benchmarks than VITS)
  • the model seems easy to tune
  • it's small, ~200Mb
  • it can be easily converted to Onnx format
  • it uses a phonemizer that is python based (no more hassle with espeak ng)
  • the phonemizer also does French and German, that have been requested on the discord

dnhkng avatar Dec 04 '24 05:12 dnhkng