GlaDOS A strange, but clever way to train new voices in the piper (onnx) format?

Download and unpack https://keithito.com/LJ-Speech-Dataset/ (a huge voice dataset incl. the script).
Install RVC WebUI (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI).
Download and install voicemodels in .pth format. (https://voice-models.com/) or create one with your own voice by reading the LJ script.
Optionaly "Convert" the WAV Files from LJ-Speech to another voice with RVC or use your voice's WAV Files.
Download and install piper and train your new voice in onnx format. (https://github.com/rhasspy/piper)
Profit?

ps. piper training is Linux only but works in Windows 10/11 WSL (https://learn.microsoft.com/en-us/windows/wsl/install) Poweshell: wsl --install

Thoughts?

Jul 04 '24 08:07 cushycrux

Could be interesting. I've tried several fast voice cloning models (a few minutes audio), and none were very good.

Also, Piper is just a wrapper on VITS, and I'm not sure I like that level of abstraction. I was thinking more about a more minimal wrapper on VITS, as I have around whisper and llama.

Jul 04 '24 10:07 dnhkng

@cushycrux have you tried that approach?

Jul 29 '24 15:07 MithrilMan

Closing this, as I will move to a news TTS system.

Dec 03 '24 21:12 dnhkng

@dnhkng which one?

Dec 03 '24 21:12 MithrilMan

Leaning towards MeloTTS:

It uses a better TTS (much higher rated on benchmarks than VITS)
the model seems easy to tune
it's small, ~200Mb
it can be easily converted to Onnx format
it uses a phonemizer that is python based (no more hassle with espeak ng)
the phonemizer also does French and German, that have been requested on the discord

Dec 04 '24 05:12 dnhkng