LLMUnity icon indicating copy to clipboard operation
LLMUnity copied to clipboard

Integrate text-to-speech and speech-to-text functionality

Open amakropoulos opened this issue 1 year ago • 12 comments

amakropoulos avatar Jan 22 '24 16:01 amakropoulos

please make this an optional package that is separate

ArEnSc avatar Feb 10 '24 18:02 ArEnSc

Yes certainly, it will be possible to attach STT or TTS to the chat functionality but it will not be enabled by default.

amakropoulos avatar Feb 11 '24 08:02 amakropoulos

Hey there 👋 , will you use Sentis for STT and TTS? Or do you have another idea?

We have some Sentis model on the Hub that are super fast (Tiny Whisper and Jets).

Tiny Whisper: https://huggingface.co/unity/sentis-whisper-tiny Jets: https://huggingface.co/unity/sentis-jets-text-to-speech

Demo with Whisper: https://singularite.itch.io/jammo-the-robot-with-unity-sentis-whisper-version

simoninithomas avatar Mar 03 '24 20:03 simoninithomas

Hi, thank you for the suggestions! I need to do a small exploration first, but yes I was thinking to start with your Whisper-Tiny model 🙂. Ideally I would like to support a range of models e.g. similarly to whisper.cpp project but need to have it working cross-platform in Unity which is work-in-progress (link).

By the way, thanks a lot for your great work on the sharp-transformers ⭐! I'm using it in the other repo, RAGSearchUnity, to build a RAG similarity search system!

amakropoulos avatar Mar 04 '24 09:03 amakropoulos

Hi @amakropoulos : I want this functionality for a project I am building! Are you planning to add this soon? I can help raise a PR for this functionality too if you are fine with this? Looking forward to hearing from you. Thanks!

siddhant-bharti avatar Mar 09 '24 21:03 siddhant-bharti

@siddhant-bharti I'm replying here as well :). This is the next big feature that I'll work on soon.

@simoninithomas I can't use Jets because it has a cc-by-4.0 license. The Unity Asset store does not allow packages with licenses that require attribution and I'd like LLM for Unity to be there as well (p.s. we are live on asset store as of last week :tada: !)

amakropoulos avatar Mar 22 '24 07:03 amakropoulos

This feature is blocked at the moment. I can't find an open-source library for TTS to integrate that fulfills the following requirements:

  • C/C++/C# code without many dependencies
  • MIT/Apache 2.0 or any other equivalent license that is open-source and attribution-free
  • allow multiple voices

The best solution would be Piper but at the moment has a potential license issue due to to using espeak (link).

amakropoulos avatar Apr 04 '24 06:04 amakropoulos

This feature is blocked at the moment. I can't find an open-source library for TTS to integrate that fulfills the following requirements:

  • C/C++/C# code without many dependencies
  • MIT/Apache 2.0 or any other equivalent license that is open-source and attribution-free
  • allow multiple voices

The best solution would be Piper but at the moment has a potential license issue due to to using espeak (link).

Hello, i've made integration of your project with openCV for facetracking, vroid as avatar, vosk stt and piper tts, but i think that the most interesting is integration with rvc, but have no time for this. Maybe you know something about ready to use RVC Unity integrations?

Pipsun avatar Apr 10 '24 08:04 Pipsun

Adding TTS and STT functionality would take llamafile to the next level!

Swiftyos avatar Apr 21 '24 08:04 Swiftyos

I also think that piper is the best choice for TTS and whisper for STT. I made a project using the UnityPiper and whisper.unity projects. It works, but it was a bit complicated getting it all to work. I also found Piper without espeak but I don't know how well it works.

SubatomicPlanets avatar Jun 21 '24 21:06 SubatomicPlanets

Having this feature is super Amazing and Instead of being blocked by Unity's opensource license requirements. Could this feature be delivered via Github instead of the unity asset store?

This way we can continue with advanced speech features in our projects with LLM for Unity.

99bits avatar Sep 11 '24 03:09 99bits

I can't spend months to implement it when it can't go to any Unity or commercial product. If someone wants to use TTS they can have a look at these options instead: commercial: Overtone, ElevenLabs open-source (Piper): piper.unity, UnityPiper I haven't personally used any of these, but I know a lot of people that use the commercial ones.

amakropoulos avatar Sep 11 '24 06:09 amakropoulos