overte icon indicating copy to clipboard operation
overte copied to clipboard

Create an API accessible through the scripting interface to generate and play text to speech

Open Armored-Dragon opened this issue 8 months ago • 2 comments

QT seems to support TTS in their ecosystem. https://doc.qt.io/qt-5/qtspeech-index.html

Armored-Dragon avatar Apr 23 '25 10:04 Armored-Dragon

It does, but it seems extremely limited.

It seems to target an extremely minimal usage scenario probably for the sake of accessibility. Better than nothing of course, but I think for our purposes it's going to be too limited.

daleglass avatar Apr 23 '25 20:04 daleglass

Understood. I did look just briefly into the QT provided TTS, and there was not a lot of information about it. It is likely we will have to find a more purpose built package for generating TTS audio. I've updated the issue title accordingly.

Armored-Dragon avatar Apr 24 '25 08:04 Armored-Dragon

We actually already have this. It's called TextToSpeech, but it's completely undocumented and only works on Windows. On Linux we could dynamically link at runtime to the system's libspeechd (lots of both free and proprietary software do this, we can't hard-link it in because it's LGPLv2.1-or-later)

interface/src/scripting/TTSScriptingInterface.cpp

In interface/resources/qml/hifi/tts/TTS.qml, it looks like there even used to be a tool somewhere for people to speak through their avatar using TTS.

ada-tv avatar May 06 '25 19:05 ada-tv

Maybe https://github.com/espeak-ng/espeak-ng would be a good way to have a cross platform library for TTS?

AnotherFoxGuy avatar May 07 '25 07:05 AnotherFoxGuy

espeak-ng is GPLv3 so we can't directly link to it. libspeechd is basically an LGPL shim to other possibly-GPL'ed or proprietary speech providers like eSpeak to make them safe to use in terms of licensing.

I've just checked the QTextToSpeech page, and they also use the native engines on Windows/macOS/Android, and libspeechd on Linux.

The existing old TextToSpeech API we have I think might have been meant to also output PCM through the avatar? Both QTextToSpeech and libspeechd don't seem to support that, so it would only be for received chat messages or the UI.

ada-tv avatar May 07 '25 08:05 ada-tv