Audio Inference example (TTS)
Still learning my hands around this library, but would be really cool if you could add a base AudioInference class just like the ImageInference class.
I am at present trying to run a KokoroTTS model with this in Unity, but that base class could help in using other TTS engines too like the new GPT-SoVits.
There are two existing libraries available running the KokoroTTS in ONNX which can serve as a reference:
-
Python https://github.com/thewh1teagle/kokoro-onnx https://github.com/thewh1teagle/kokoro-onnx/blob/main/src/kokoro_onnx/init.py
-
C# (this one uses espeak as external dependency so I am trying to replicate the above python one instead) https://github.com/Lyrcaxis/KokoroSharp
The inference goal would be to provide a text (string) to the onnx model with parameters, and get a audiostream or audiofile output.
@shubhank008 TTS will be a nice example to demonstrate the potential of the OnnxRuntime. The KokoroTTS looks nice, but porting might be challenging since it internally depends on the espeak native library, which doesn't support iOS yet.
@shubhank008 TTS will be a nice example to demonstrate the potential of the OnnxRuntime. The KokoroTTS looks nice, but porting might be challenging since it internally depends on the espeak native library, which doesn't support iOS yet.
I found a KokoroSharp project and currently porting that for Unity-specific code instead, removing need for espeak/numsharp.
The results might be a little inferior using the inbuilt IPA g2p I created, but I think would be good trade for cross-platform support and not having to run espeak.
Will update you on how it goes
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
hi @shubhank008 Any progress on this?
I've created a POC project for running the Kokoro TTS model on Unity multi platforms.
https://github.com/asus4/kokoro-tts-unity
the Kokoro Onnx model itself works perfectly with the ORT Unity library. However (as we expected) G2P was a bit tricky and I haven't finished implementing it.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.