A strange, but clever way to train new voices in the piper (onnx) format?
- Download and unpack https://keithito.com/LJ-Speech-Dataset/ (a huge voice dataset incl. the script).
- Install RVC WebUI (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI).
- Download and install voicemodels in .pth format. (https://voice-models.com/) or create one with your own voice by reading the LJ script.
- Optionaly "Convert" the WAV Files from LJ-Speech to another voice with RVC or use your voice's WAV Files.
- Download and install piper and train your new voice in onnx format. (https://github.com/rhasspy/piper)
- Profit?
ps. piper training is Linux only but works in Windows 10/11 WSL (https://learn.microsoft.com/en-us/windows/wsl/install)
Poweshell:
wsl --install
Thoughts?
Could be interesting. I've tried several fast voice cloning models (a few minutes audio), and none were very good.
Also, Piper is just a wrapper on VITS, and I'm not sure I like that level of abstraction. I was thinking more about a more minimal wrapper on VITS, as I have around whisper and llama.
@cushycrux have you tried that approach?
Closing this, as I will move to a news TTS system.
@dnhkng which one?
Leaning towards MeloTTS:
- It uses a better TTS (much higher rated on benchmarks than VITS)
- the model seems easy to tune
- it's small, ~200Mb
- it can be easily converted to Onnx format
- it uses a phonemizer that is python based (no more hassle with espeak ng)
- the phonemizer also does French and German, that have been requested on the discord